Multivariate Time Series Models: ARIMAX/SARIMAX/VAR

After fitting univariate time series models, it is clear that greater complexity may be needed to fit models that better understand the time series data. This section will include the fitting of five different multivariate time series models, including ARIMAX and VAR models. The best parameters for each model will be chosen by comparing RMSE values. This will provide insight into which factors have the greatest influence on public transit ridership according to the data, which is crucial for addressing many of the questions identified on earlier pages. From this analysis, we hope to use exogenous variables as additional information for forecasting.

Note: Much of the code and process on this page is re-purposed from:

Gamage, P. (2026). Applied Time Series for Data Science.

Code
library(tidyverse)
library(ggplot2)
library(forecast)
library(astsa) 
library(xts)
library(tseries)
library(fpp2)
library(fma)
library(lubridate)
library(tidyverse)
library(TSstudio)
library(quantmod)
library(tidyquant)
library(plotly)
library(vars)
library(dplyr)
library(patchwork)

Literature Review

Article 1: Telework and the day-to-day variability of travel behaviour: The specificities of Fridays

This article details the phenomenon of telework increasing drastically in the wake of COVID-19, and the externalities of this change. Specifically, telework has great impact on both the environment and public transportation, both of which are considerations for assessing the efficacy of remote work. The authors suggest that the prevalence of telework poses a threat to the economic viability of public transit systems. Thus, this section will evaluate the relationship between public transit usage and the employment landscape to address the challenges outlined in the article.

Link: https://www.sciencedirect.com/science/article/pii/S1361920924002025

Article 2: Modeling car ownership and use in a developing country context with informal public transportation

This article evaluates the ability for public transportation to exist in a society that is heavily reliant on cars. As we know, transitions from private automobile transportation to public transportation can result in many advantages for a municipality. However, the authors surmise that “only if major improvements to the public transportation services are enacted would a decrease in car ownership and usage be achieved” in societies that are already so car-focused. Therefore, this section will attempt to investigate the relationship between people’s decisions to operate cars, use rideshare services, or take public transportation to evaluate the ability for these methods to coexist.

Link: https://link.springer.com/article/10.1007/s11116-020-10161-5

Model 1:

In this model, we will evaluate the relationships between rideshare usage, vehicle miles, and public transit ridership. The purpose of this is to see the interaction between three of the most common options for transportation, and how each of these time series can be used as predictors for the others. Ultimately, consumers often have choices between utilizing these three options, or are inclined to use one over the others due to convenience, price, or speed. This model should help us understand how they relate.

Variables

VAR: Rideshare Usage, Vehicle-Miles, Public Transit Ridership

Question: How do the usages of different forms of transportation interact with one another?

Model Fitting

Code
rideshare <- read_csv("../code/uber.csv")
ridership <- read_csv("../data/ridership.csv")
miles <- read_csv("../data/monthly_miles.csv")
Code
ridership_ts <- ts(ridership$`Total Ridership (000s)`, start=c(1992,1), frequency = 4)
uber_ts <- ts(rideshare$Rides, start=c(2017,1), frequency = 4)
miles <- miles[c(554:nrow(miles)),]
miles_ts <- ts(miles$M12MTVUSM227NFWA, start=c(2017,1), frequency = 12)

miles_ts <- aggregate(miles_ts, nfrequency = 4, FUN = mean)

start_year <- 2017
start_quarter <- 1
end_year <- 2024
end_quarter <- 1

miles_ts_trimmed <- window(miles_ts, 
                           start = c(start_year, start_quarter), 
                           end = c(end_year, end_quarter))

uber_ts_trimmed <- window(uber_ts, 
                          start = c(start_year, start_quarter), 
                          end = c(end_year, end_quarter))

ridership_ts_trimmed <- window(ridership_ts, 
                               start = c(start_year, start_quarter), 
                               end = c(end_year, end_quarter))
Code
combined_ts <- cbind(miles_ts_trimmed, uber_ts_trimmed, ridership_ts_trimmed)
combined_scaled <- scale(combined_ts)
combined_scaled_ts <- ts(combined_scaled, 
                         start = start(combined_ts), 
                         frequency = 4)
VARselect(combined_scaled_ts, lag.max=10, type="both")
$selection
AIC(n)  HQ(n)  SC(n) FPE(n) 
     6      6      6      5 

$criteria
                   1             2             3             4             5
AIC(n) -7.5347492749 -8.1576559112 -8.9090278431 -1.114224e+01           NaN
HQ(n)  -7.4085624862 -7.9557570493 -8.6314169080 -1.078891e+01           NaN
SC(n)  -6.7891395545 -6.9646803586 -7.2686864583 -9.054531e+00           NaN
FPE(n)  0.0005549086  0.0003387843  0.0002209786  5.007035e-05 -2.258884e-20
          6    7    8    9   10
AIC(n) -Inf -Inf -Inf -Inf -Inf
HQ(n)  -Inf -Inf -Inf -Inf -Inf
SC(n)  -Inf -Inf -Inf -Inf -Inf
FPE(n)    0    0    0    0    0

According to this model fitting, \(p=6\) is sufficient. We will compare with \(p=1\) to ensure the best model is chosen.

Code
summary(vars::VAR(combined_scaled_ts, p=5, type='both'))

VAR Estimation Results:
========================= 
Endogenous variables: miles_ts_trimmed, uber_ts_trimmed, ridership_ts_trimmed 
Deterministic variables: both 
Sample size: 24 
Log Likelihood: 89.216 
Roots of the characteristic polynomial:
1.025 1.025 0.9796 0.9796 0.9735 0.9735 0.9723 0.9723 0.9298 0.8901 0.8901 0.8269 0.8269 0.4651 0.4651
Call:
vars::VAR(y = combined_scaled_ts, p = 5, type = "both")


Estimation results for equation miles_ts_trimmed: 
================================================= 
miles_ts_trimmed = miles_ts_trimmed.l1 + uber_ts_trimmed.l1 + ridership_ts_trimmed.l1 + miles_ts_trimmed.l2 + uber_ts_trimmed.l2 + ridership_ts_trimmed.l2 + miles_ts_trimmed.l3 + uber_ts_trimmed.l3 + ridership_ts_trimmed.l3 + miles_ts_trimmed.l4 + uber_ts_trimmed.l4 + ridership_ts_trimmed.l4 + miles_ts_trimmed.l5 + uber_ts_trimmed.l5 + ridership_ts_trimmed.l5 + const + trend 

                        Estimate Std. Error t value Pr(>|t|)  
miles_ts_trimmed.l1     -2.89456    2.33063  -1.242   0.2542  
uber_ts_trimmed.l1       0.82099    1.47119   0.558   0.5942  
ridership_ts_trimmed.l1  1.83453    1.58624   1.157   0.2854  
miles_ts_trimmed.l2      1.35798    1.22617   1.108   0.3047  
uber_ts_trimmed.l2      -0.82739    1.25425  -0.660   0.5306  
ridership_ts_trimmed.l2  1.38147    1.13216   1.220   0.2619  
miles_ts_trimmed.l3      0.15230    0.51631   0.295   0.7766  
uber_ts_trimmed.l3       0.85419    1.26354   0.676   0.5207  
ridership_ts_trimmed.l3 -0.11980    1.07524  -0.111   0.9144  
miles_ts_trimmed.l4      0.38873    0.46046   0.844   0.4264  
uber_ts_trimmed.l4       0.05763    1.24510   0.046   0.9644  
ridership_ts_trimmed.l4 -0.05485    1.39568  -0.039   0.9697  
miles_ts_trimmed.l5     -0.82421    0.44572  -1.849   0.1069  
uber_ts_trimmed.l5      -1.57424    1.41591  -1.112   0.3029  
ridership_ts_trimmed.l5 -0.24005    1.65662  -0.145   0.8889  
const                   -4.36114    2.21934  -1.965   0.0901 .
trend                    0.23743    0.12554   1.891   0.1005  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.3498 on 7 degrees of freedom
Multiple R-Squared: 0.9689, Adjusted R-squared: 0.8977 
F-statistic: 13.61 on 16 and 7 DF,  p-value: 0.0009113 


Estimation results for equation uber_ts_trimmed: 
================================================ 
uber_ts_trimmed = miles_ts_trimmed.l1 + uber_ts_trimmed.l1 + ridership_ts_trimmed.l1 + miles_ts_trimmed.l2 + uber_ts_trimmed.l2 + ridership_ts_trimmed.l2 + miles_ts_trimmed.l3 + uber_ts_trimmed.l3 + ridership_ts_trimmed.l3 + miles_ts_trimmed.l4 + uber_ts_trimmed.l4 + ridership_ts_trimmed.l4 + miles_ts_trimmed.l5 + uber_ts_trimmed.l5 + ridership_ts_trimmed.l5 + const + trend 

                        Estimate Std. Error t value Pr(>|t|)  
miles_ts_trimmed.l1      -3.7711     2.7660  -1.363   0.2150  
uber_ts_trimmed.l1        1.4836     1.7460   0.850   0.4236  
ridership_ts_trimmed.l1   1.4483     1.8826   0.769   0.4668  
miles_ts_trimmed.l2       1.5115     1.4552   1.039   0.3335  
uber_ts_trimmed.l2       -1.0213     1.4885  -0.686   0.5147  
ridership_ts_trimmed.l2   1.5168     1.3436   1.129   0.2962  
miles_ts_trimmed.l3       0.1540     0.6128   0.251   0.8088  
uber_ts_trimmed.l3        1.1162     1.4996   0.744   0.4809  
ridership_ts_trimmed.l3  -0.5552     1.2761  -0.435   0.6766  
miles_ts_trimmed.l4       0.6124     0.5465   1.121   0.2994  
uber_ts_trimmed.l4        0.6089     1.4777   0.412   0.6926  
ridership_ts_trimmed.l4   0.1008     1.6564   0.061   0.9532  
miles_ts_trimmed.l5      -1.0993     0.5290  -2.078   0.0763 .
uber_ts_trimmed.l5       -2.5395     1.6804  -1.511   0.1745  
ridership_ts_trimmed.l5   0.4220     1.9661   0.215   0.8362  
const                    -6.0836     2.6339  -2.310   0.0542 .
trend                     0.3308     0.1490   2.220   0.0618 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.4151 on 7 degrees of freedom
Multiple R-Squared: 0.9261, Adjusted R-squared: 0.7571 
F-statistic: 5.481 on 16 and 7 DF,  p-value: 0.01479 


Estimation results for equation ridership_ts_trimmed: 
===================================================== 
ridership_ts_trimmed = miles_ts_trimmed.l1 + uber_ts_trimmed.l1 + ridership_ts_trimmed.l1 + miles_ts_trimmed.l2 + uber_ts_trimmed.l2 + ridership_ts_trimmed.l2 + miles_ts_trimmed.l3 + uber_ts_trimmed.l3 + ridership_ts_trimmed.l3 + miles_ts_trimmed.l4 + uber_ts_trimmed.l4 + ridership_ts_trimmed.l4 + miles_ts_trimmed.l5 + uber_ts_trimmed.l5 + ridership_ts_trimmed.l5 + const + trend 

                         Estimate Std. Error t value Pr(>|t|)  
miles_ts_trimmed.l1     -5.388318   3.756220  -1.435   0.1946  
uber_ts_trimmed.l1       1.506373   2.371093   0.635   0.5454  
ridership_ts_trimmed.l1  2.604635   2.556516   1.019   0.3422  
miles_ts_trimmed.l2      2.449705   1.976193   1.240   0.2551  
uber_ts_trimmed.l2      -1.481239   2.021455  -0.733   0.4875  
ridership_ts_trimmed.l2  2.110733   1.824684   1.157   0.2853  
miles_ts_trimmed.l3      0.247200   0.832129   0.297   0.7750  
uber_ts_trimmed.l3       1.617436   2.036424   0.794   0.4531  
ridership_ts_trimmed.l3 -0.909495   1.732948  -0.525   0.6159  
miles_ts_trimmed.l4      0.825068   0.742107   1.112   0.3029  
uber_ts_trimmed.l4      -0.120253   2.006700  -0.060   0.9539  
ridership_ts_trimmed.l4  0.640952   2.249391   0.285   0.7839  
miles_ts_trimmed.l5     -1.436614   0.718350  -2.000   0.0856 .
uber_ts_trimmed.l5      -3.147909   2.281995  -1.379   0.2102  
ridership_ts_trimmed.l5 -0.008031   2.669940  -0.003   0.9977  
const                   -7.765594   3.576866  -2.171   0.0665 .
trend                    0.396662   0.202333   1.960   0.0908 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.5638 on 7 degrees of freedom
Multiple R-Squared: 0.8978, Adjusted R-squared: 0.6643 
F-statistic: 3.845 on 16 and 7 DF,  p-value: 0.03908 



Covariance matrix of residuals:
                     miles_ts_trimmed uber_ts_trimmed ridership_ts_trimmed
miles_ts_trimmed               0.1224          0.1425               0.1951
uber_ts_trimmed                0.1425          0.1723               0.2292
ridership_ts_trimmed           0.1951          0.2292               0.3178

Correlation matrix of residuals:
                     miles_ts_trimmed uber_ts_trimmed ridership_ts_trimmed
miles_ts_trimmed               1.0000          0.9813               0.9894
uber_ts_trimmed                0.9813          1.0000               0.9793
ridership_ts_trimmed           0.9894          0.9793               1.0000
Code
summary(vars::VAR(combined_ts, p=1, type='both'))

VAR Estimation Results:
========================= 
Endogenous variables: miles_ts_trimmed, uber_ts_trimmed, ridership_ts_trimmed 
Deterministic variables: both 
Sample size: 28 
Log Likelihood: -782.654 
Roots of the characteristic polynomial:
0.943 0.5921 0.5921
Call:
vars::VAR(y = combined_ts, p = 1, type = "both")


Estimation results for equation miles_ts_trimmed: 
================================================= 
miles_ts_trimmed = miles_ts_trimmed.l1 + uber_ts_trimmed.l1 + ridership_ts_trimmed.l1 + const + trend 

                          Estimate Std. Error t value Pr(>|t|)   
miles_ts_trimmed.l1      2.768e-01  2.245e-01   1.233  0.22994   
uber_ts_trimmed.l1      -8.057e+02  1.854e+03  -0.435  0.66791   
ridership_ts_trimmed.l1  1.637e-01  7.577e-02   2.160  0.04144 * 
const                    1.918e+06  6.007e+05   3.192  0.00405 **
trend                    9.471e+03  8.569e+03   1.105  0.28047   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 48760 on 23 degrees of freedom
Multiple R-Squared: 0.8315, Adjusted R-squared: 0.8021 
F-statistic: 28.37 on 4 and 23 DF,  p-value: 1.352e-08 


Estimation results for equation uber_ts_trimmed: 
================================================ 
uber_ts_trimmed = miles_ts_trimmed.l1 + uber_ts_trimmed.l1 + ridership_ts_trimmed.l1 + const + trend 

                          Estimate Std. Error t value Pr(>|t|)  
miles_ts_trimmed.l1     -5.785e-05  4.806e-05  -1.204   0.2410  
uber_ts_trimmed.l1       1.261e-02  3.970e-01   0.032   0.9749  
ridership_ts_trimmed.l1  2.479e-05  1.622e-05   1.528   0.1402  
const                    1.755e+02  1.286e+02   1.364   0.1857  
trend                    4.011e+00  1.835e+00   2.186   0.0392 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 10.44 on 23 degrees of freedom
Multiple R-Squared: 0.877,  Adjusted R-squared: 0.8556 
F-statistic: 40.98 on 4 and 23 DF,  p-value: 3.805e-10 


Estimation results for equation ridership_ts_trimmed: 
===================================================== 
ridership_ts_trimmed = miles_ts_trimmed.l1 + uber_ts_trimmed.l1 + ridership_ts_trimmed.l1 + const + trend 

                          Estimate Std. Error t value Pr(>|t|)   
miles_ts_trimmed.l1     -1.994e+00  1.431e+00  -1.393  0.17702   
uber_ts_trimmed.l1      -2.198e+04  1.182e+04  -1.859  0.07593 . 
ridership_ts_trimmed.l1  1.673e+00  4.832e-01   3.463  0.00211 **
const                    5.667e+06  3.831e+06   1.479  0.15265   
trend                    9.850e+04  5.465e+04   1.802  0.08460 . 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 310900 on 23 degrees of freedom
Multiple R-Squared: 0.7788, Adjusted R-squared: 0.7403 
F-statistic: 20.25 on 4 and 23 DF,  p-value: 2.903e-07 



Covariance matrix of residuals:
                     miles_ts_trimmed uber_ts_trimmed ridership_ts_trimmed
miles_ts_trimmed            2.378e+09          325936            1.209e+10
uber_ts_trimmed             3.259e+05             109            3.063e+06
ridership_ts_trimmed        1.209e+10         3062638            9.669e+10

Correlation matrix of residuals:
                     miles_ts_trimmed uber_ts_trimmed ridership_ts_trimmed
miles_ts_trimmed               1.0000          0.6403               0.7975
uber_ts_trimmed                0.6403          1.0000               0.9434
ridership_ts_trimmed           0.7975          0.9434               1.0000

Based on the correlation matrices of residuals, using \(p=1\) appears to be more effective.

Cross Validation

Code
n <- length(combined_scaled_ts[,1])
k <- floor(n / 2 / 2) * 2
h <- 4 
n_k <- n - k
q_steps <- n_k / h

rmse1 <- matrix(NA, n_k, 3)
rmse2 <- matrix(NA, n_k, 3)

st <- tsp(combined_scaled_ts)[1] + (k - 1) / 4  

# Cross-validation loop for time series forecasting
for (i in 1:q_steps) {  
  # Define training and testing windows
  xtrain <- window(combined_scaled_ts, end = st + i - 1)
  xtest <- window(combined_scaled_ts, start = st + (i - 1) + 1/4, end = st + i)
  
  ######## First Model (VAR with p=1) ########
  fit1 <- vars::VAR(xtrain, p = 1, type = "both")
  fcast1 <- predict(fit1, n.ahead = 2)
  
  f_miles <- fcast1$fcst$miles_ts_trimmed[, 1]
  f_uber <- fcast1$fcst$uber_ts_trimmed[, 1]
  f_ridership  <- fcast1$fcst$ridership_ts_trimmed[, 1]
  
  ff1 <- data.frame(f_miles, f_uber, f_ridership)
  
  year <- st + (i - 1) + 1/4

  ff1 <- ts(ff1, start = c(year, 1), frequency = 4)

  a <- min(4 * i - 3)
  b <- min(4 * i)
  
  rmse1[c(a:b), ] <- sqrt((ff1 - xtest) ^ 2)
  
  fit2 <- vars::VAR(xtrain, p = 5, type = "both")
  fcast2 <- predict(fit2, n.ahead = 2)
  
  f_miles2 <- fcast2$fcst$miles_ts_trimmed[, 1]
  f_uber2 <- fcast2$fcst$uber_ts_trimmed[, 1]
  f_ridership2  <- fcast2$fcst$ridership_ts_trimmed[, 1]
  
  ff2 <- data.frame(f_miles2, f_uber2, f_ridership2)
  
  year <- st + (i - 1) + 1/4

  ff2 <- ts(ff2, start = c(year, 1), frequency = 4)
  
  a <- min(4 * i - 3)
  b <- min(4 * i)

  rmse2[c(a:b), ] <- sqrt((ff2 - xtest) ^ 2)
}

print(paste("Model 1 (p=1): ",  mean(rmse1, na.rm = TRUE)))
[1] "Model 1 (p=1):  49.1454416586549"
Code
print(paste("Model 2 (p=5): ",  mean(rmse2, na.rm = TRUE)))
[1] "Model 2 (p=5):  3.76364519519523"

Based on these RMSE values, it appears that \(p=5\) is the best model.

Chosen Model

My chosen model is VAR(5).

Forecasting

Code
library(patchwork)

fit <- VAR(combined_ts, p = 5, type = "both")

fit_pr <- predict(fit, n.ahead = 10, ci = 0.95)

actual_data <- as.data.frame(combined_ts)
actual_data$Time <- as.numeric(time(combined_ts))

f_ridership <- fit_pr$fcst$ridership_ts_trimmed[, "fcst"]
f_miles <- fit_pr$fcst$miles_ts_trimmed[, "fcst"]
f_uber <- fit_pr$fcst$uber_ts_trimmed[, "fcst"]

ridership_lower <- fit_pr$fcst$ridership_ts_trimmed[, "lower"]
ridership_upper <- fit_pr$fcst$ridership_ts_trimmed[, "upper"]

miles_lower <- fit_pr$fcst$miles_ts_trimmed[, "lower"]
miles_upper <- fit_pr$fcst$miles_ts_trimmed[, "upper"]

uber_lower  <- fit_pr$fcst$uber_ts_trimmed[, "lower"]
uber_upper  <- fit_pr$fcst$uber_ts_trimmed[, "upper"]

# Create future time points as numeric
time_index <- as.numeric(seq(from = end(combined_ts)[1] + 1, length.out = 10))

# Create data frames for forecasts
forecast_ridership <- data.frame(Time = time_index, Forecast = f_ridership, Lower = ridership_lower, Upper = ridership_upper)
forecast_miles <- data.frame(Time = time_index, Forecast = f_miles, Lower = miles_lower, Upper = miles_upper)
forecast_uber  <- data.frame(Time = time_index, Forecast = f_uber, Lower = uber_lower, Upper = uber_upper)

p1 <- ggplot() +
  geom_line(data = actual_data, aes(x = Time, y = ridership_ts_trimmed), color = "blue") +
  geom_ribbon(data = forecast_ridership, aes(x = Time, ymin = Lower, ymax = Upper), 
              fill = "blue", alpha = 0.2) +
  geom_line(data = forecast_ridership, aes(x = Time, y = Forecast), color = "blue", linetype = "dashed") +
  labs(title = "Ridership Forecast", x = "Time", y = "Public Transit Ridership") +
  theme_minimal()

p2 <- ggplot() +
  geom_line(data = actual_data, aes(x = Time, y = miles_ts_trimmed), color = "red") +
  geom_ribbon(data = forecast_miles, aes(x = Time, ymin = Lower, ymax = Upper), 
              fill = "red", alpha = 0.2) +
  geom_line(data = forecast_miles, aes(x = Time, y = Forecast), color = "red", linetype = "dashed") +
  labs(title = "Vehicle Miles Forecast", x = "Time", y = "Vehicle Miles") +
  theme_minimal()

p3 <- ggplot() +
  geom_line(data = actual_data, aes(x = Time, y = uber_ts_trimmed), color = "green") +
  geom_ribbon(data = forecast_uber, aes(x = Time, ymin = Lower, ymax = Upper), 
              fill = "green", alpha = 0.2) +
  geom_line(data = forecast_uber, aes(x = Time, y = Forecast), color = "green", linetype = "dashed") +
  labs(title = "Rideshare Users Forecast", x = "Time", y = "Rideshare Users") +
  theme_minimal()

# Arrange plots using patchwork
(p1 / p2) / p3

These forecasts show that both public transit ridership and total automobile miles driven are expected to slightly increase in the short-term, prior to a downturn. Meanwhile, rideshare users are expected to stagnate. It is reasonable to assume that given the small timeframe given in training, as well as COVID being responsible for a great deal of the existing data, this is not particularly convincing. Additionally, the 95% confidence bands do little to indicate that these are particularly confident forecasts.

Model 2:

In this model, we will evaluate the relationships between all transit types, of which there are six in the provided data. The purpose of this is to see the interaction between the common types of public transportation, and how each of these time series can be used as predictors for the others. This should help us understand whether certain forms of transportation trend differently, and which factors may influence that.

Variables

VAR: Ridership of all Transit Types

Question: Is there a relationship between the ridership of different transit types (bus, rail, etc.)?

Model Fitting

Code
heavy_rail_ts <- ts(ridership$`Heavy Rail (000s)`, start=c(1992,1), frequency = 4)
light_rail_ts <- ts(ridership$`Light Rail (000s)`, start=c(1992,1), frequency = 4)
commuter_rail_ts <- ts(ridership$`Commuter Rail (000s)`, start=c(1992,1), frequency = 4)
trolleybus_ts <- ts(ridership$`Trolleybus (000s)`, start=c(1992,1), frequency = 4)
bus_ts <- ts(ridership$`Bus (000s)`, start=c(1992,1), frequency = 4)
demand_response_ts <- ts(ridership$`Demand Response (000s)`, start=c(1992,1), frequency = 4)
Code
modes_ts <- cbind(heavy_rail_ts, light_rail_ts, commuter_rail_ts, trolleybus_ts, bus_ts, demand_response_ts)
modes_ts <- na.remove(modes_ts)
VARselect(modes_ts, lag.max=10, type="both")
$selection
AIC(n)  HQ(n)  SC(n) FPE(n) 
     5      1      1      5 

$criteria
                  1            2            3            4            5
AIC(n) 1.047944e+02 1.045979e+02 1.044448e+02 1.043803e+02 1.039560e+02
HQ(n)  1.052496e+02 1.053945e+02 1.055828e+02 1.058597e+02 1.057768e+02
SC(n)  1.059154e+02 1.065596e+02 1.072473e+02 1.080235e+02 1.084400e+02
FPE(n) 3.252105e+45 2.686178e+45 2.334546e+45 2.241073e+45 1.523538e+45
                  6            7            8            9           10
AIC(n) 1.040252e+02 1.039675e+02 1.041115e+02 1.042143e+02 1.042399e+02
HQ(n)  1.061874e+02 1.064711e+02 1.069565e+02 1.074007e+02 1.077677e+02
SC(n)  1.093499e+02 1.101330e+02 1.111177e+02 1.120613e+02 1.129276e+02
FPE(n) 1.729277e+45 1.771470e+45 2.288473e+45 2.947133e+45 3.684626e+45

Based on these results, we will test \(p=1\) and \(p=5\).

Code
summary(vars::VAR(modes_ts, p=1, type='both'))

VAR Estimation Results:
========================= 
Endogenous variables: heavy_rail_ts, light_rail_ts, commuter_rail_ts, trolleybus_ts, bus_ts, demand_response_ts 
Deterministic variables: both 
Sample size: 128 
Log Likelihood: -7770.928 
Roots of the characteristic polynomial:
0.9401 0.8456 0.8456 0.6508 0.5116 0.09146
Call:
vars::VAR(y = modes_ts, p = 1, type = "both")


Estimation results for equation heavy_rail_ts: 
============================================== 
heavy_rail_ts = heavy_rail_ts.l1 + light_rail_ts.l1 + commuter_rail_ts.l1 + trolleybus_ts.l1 + bus_ts.l1 + demand_response_ts.l1 + const + trend 

                        Estimate Std. Error t value Pr(>|t|)   
heavy_rail_ts.l1       3.614e-01  3.005e-01   1.203  0.23143   
light_rail_ts.l1       3.220e+00  1.300e+00   2.476  0.01468 * 
commuter_rail_ts.l1   -2.424e+00  1.835e+00  -1.321  0.18899   
trolleybus_ts.l1       1.400e+01  4.294e+00   3.261  0.00144 **
bus_ts.l1              2.925e-02  8.904e-02   0.328  0.74313   
demand_response_ts.l1  3.116e+00  2.471e+00   1.261  0.20989   
const                 -1.685e+05  1.177e+05  -1.432  0.15484   
trend                  1.303e+03  7.999e+02   1.629  0.10584   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 69280 on 120 degrees of freedom
Multiple R-Squared: 0.8722, Adjusted R-squared: 0.8647 
F-statistic:   117 on 7 and 120 DF,  p-value: < 2.2e-16 


Estimation results for equation light_rail_ts: 
============================================== 
light_rail_ts = heavy_rail_ts.l1 + light_rail_ts.l1 + commuter_rail_ts.l1 + trolleybus_ts.l1 + bus_ts.l1 + demand_response_ts.l1 + const + trend 

                        Estimate Std. Error t value Pr(>|t|)    
heavy_rail_ts.l1       1.096e-02  3.602e-02   0.304   0.7613    
light_rail_ts.l1       9.634e-01  1.559e-01   6.181 9.04e-09 ***
commuter_rail_ts.l1   -4.551e-01  2.200e-01  -2.069   0.0407 *  
trolleybus_ts.l1       1.331e+00  5.146e-01   2.587   0.0109 *  
bus_ts.l1              1.353e-02  1.067e-02   1.268   0.2073    
demand_response_ts.l1  2.294e-01  2.962e-01   0.775   0.4401    
const                 -2.904e+04  1.411e+04  -2.059   0.0417 *  
trend                  1.918e+02  9.586e+01   2.001   0.0477 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 8303 on 120 degrees of freedom
Multiple R-Squared: 0.9278, Adjusted R-squared: 0.9236 
F-statistic: 220.2 on 7 and 120 DF,  p-value: < 2.2e-16 


Estimation results for equation commuter_rail_ts: 
================================================= 
commuter_rail_ts = heavy_rail_ts.l1 + light_rail_ts.l1 + commuter_rail_ts.l1 + trolleybus_ts.l1 + bus_ts.l1 + demand_response_ts.l1 + const + trend 

                        Estimate Std. Error t value Pr(>|t|)    
heavy_rail_ts.l1      -3.897e-02  3.743e-02  -1.041   0.2999    
light_rail_ts.l1       3.405e-01  1.620e-01   2.102   0.0377 *  
commuter_rail_ts.l1    4.496e-01  2.286e-01   1.967   0.0515 .  
trolleybus_ts.l1       2.340e+00  5.349e-01   4.376  2.6e-05 ***
bus_ts.l1              2.628e-03  1.109e-02   0.237   0.8131    
demand_response_ts.l1  2.429e-01  3.078e-01   0.789   0.4316    
const                 -3.300e+04  1.466e+04  -2.251   0.0262 *  
trend                  2.395e+02  9.963e+01   2.404   0.0178 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 8629 on 120 degrees of freedom
Multiple R-Squared: 0.8627, Adjusted R-squared: 0.8547 
F-statistic: 107.7 on 7 and 120 DF,  p-value: < 2.2e-16 


Estimation results for equation trolleybus_ts: 
============================================== 
trolleybus_ts = heavy_rail_ts.l1 + light_rail_ts.l1 + commuter_rail_ts.l1 + trolleybus_ts.l1 + bus_ts.l1 + demand_response_ts.l1 + const + trend 

                        Estimate Std. Error t value Pr(>|t|)    
heavy_rail_ts.l1       1.029e-02  8.598e-03   1.197  0.23369    
light_rail_ts.l1       6.654e-03  3.721e-02   0.179  0.85836    
commuter_rail_ts.l1   -4.159e-02  5.251e-02  -0.792  0.42989    
trolleybus_ts.l1       4.139e-01  1.229e-01   3.369  0.00102 ** 
bus_ts.l1              2.511e-03  2.548e-03   0.986  0.32620    
demand_response_ts.l1 -2.116e-02  7.071e-02  -0.299  0.76530    
const                  1.434e+04  3.367e+03   4.260 4.09e-05 ***
trend                 -9.802e+01  2.289e+01  -4.283 3.74e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 1982 on 120 degrees of freedom
Multiple R-Squared: 0.9121, Adjusted R-squared: 0.907 
F-statistic: 177.9 on 7 and 120 DF,  p-value: < 2.2e-16 


Estimation results for equation bus_ts: 
======================================= 
bus_ts = heavy_rail_ts.l1 + light_rail_ts.l1 + commuter_rail_ts.l1 + trolleybus_ts.l1 + bus_ts.l1 + demand_response_ts.l1 + const + trend 

                        Estimate Std. Error t value Pr(>|t|)    
heavy_rail_ts.l1      -3.750e-01  2.988e-01  -1.255 0.211883    
light_rail_ts.l1       2.337e+00  1.293e+00   1.807 0.073227 .  
commuter_rail_ts.l1   -1.691e+00  1.825e+00  -0.927 0.355974    
trolleybus_ts.l1       1.599e+01  4.269e+00   3.745 0.000279 ***
bus_ts.l1              8.521e-01  8.853e-02   9.625  < 2e-16 ***
demand_response_ts.l1  1.358e+00  2.457e+00   0.553 0.581403    
const                 -1.140e+05  1.170e+05  -0.974 0.332072    
trend                  1.243e+03  7.953e+02   1.563 0.120757    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 68880 on 120 degrees of freedom
Multiple R-Squared: 0.9116, Adjusted R-squared: 0.9065 
F-statistic: 176.8 on 7 and 120 DF,  p-value: < 2.2e-16 


Estimation results for equation demand_response_ts: 
=================================================== 
demand_response_ts = heavy_rail_ts.l1 + light_rail_ts.l1 + commuter_rail_ts.l1 + trolleybus_ts.l1 + bus_ts.l1 + demand_response_ts.l1 + const + trend 

                        Estimate Std. Error t value Pr(>|t|)    
heavy_rail_ts.l1       1.526e-03  1.498e-02   0.102  0.91905    
light_rail_ts.l1       1.528e-01  6.482e-02   2.358  0.02000 *  
commuter_rail_ts.l1   -2.454e-01  9.148e-02  -2.683  0.00833 ** 
trolleybus_ts.l1       4.559e-01  2.140e-01   2.130  0.03522 *  
bus_ts.l1              6.396e-03  4.438e-03   1.441  0.15214    
demand_response_ts.l1  8.428e-01  1.232e-01   6.841 3.53e-10 ***
const                 -7.728e+03  5.867e+03  -1.317  0.19025    
trend                  6.071e+01  3.987e+01   1.523  0.13045    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 3453 on 120 degrees of freedom
Multiple R-Squared: 0.9325, Adjusted R-squared: 0.9285 
F-statistic: 236.7 on 7 and 120 DF,  p-value: < 2.2e-16 



Covariance matrix of residuals:
                   heavy_rail_ts light_rail_ts commuter_rail_ts trolleybus_ts
heavy_rail_ts          4.800e+09     507389406        566559453      85712392
light_rail_ts          5.074e+08      68942398         67002232       8943827
commuter_rail_ts       5.666e+08      67002232         74467370      10979027
trolleybus_ts          8.571e+07       8943827         10979027       3929227
bus_ts                 3.867e+09     432376217        486210176      87834937
demand_response_ts     1.995e+08      19912655         22272597       3075176
                      bus_ts demand_response_ts
heavy_rail_ts      3.867e+09          199466607
light_rail_ts      4.324e+08           19912655
commuter_rail_ts   4.862e+08           22272597
trolleybus_ts      8.783e+07            3075176
bus_ts             4.745e+09          142091904
demand_response_ts 1.421e+08           11926154

Correlation matrix of residuals:
                   heavy_rail_ts light_rail_ts commuter_rail_ts trolleybus_ts
heavy_rail_ts             1.0000        0.8821           0.9477        0.6242
light_rail_ts             0.8821        1.0000           0.9351        0.5434
commuter_rail_ts          0.9477        0.9351           1.0000        0.6418
trolleybus_ts             0.6242        0.5434           0.6418        1.0000
bus_ts                    0.8103        0.7560           0.8180        0.6433
demand_response_ts        0.8337        0.6944           0.7474        0.4492
                   bus_ts demand_response_ts
heavy_rail_ts      0.8103             0.8337
light_rail_ts      0.7560             0.6944
commuter_rail_ts   0.8180             0.7474
trolleybus_ts      0.6433             0.4492
bus_ts             1.0000             0.5973
demand_response_ts 0.5973             1.0000
Code
summary(vars::VAR(modes_ts, p=5, type='both'))

VAR Estimation Results:
========================= 
Endogenous variables: heavy_rail_ts, light_rail_ts, commuter_rail_ts, trolleybus_ts, bus_ts, demand_response_ts 
Deterministic variables: both 
Sample size: 124 
Log Likelihood: -7318.697 
Roots of the characteristic polynomial:
0.9742 0.938 0.938 0.9302 0.9302 0.9164 0.9164 0.887 0.887 0.8323 0.8057 0.8057 0.7945 0.7945 0.7853 0.7853 0.7506 0.7506 0.7506 0.7506 0.7419 0.7419 0.735 0.735 0.728 0.728 0.5967 0.4273 0.4273 0.3217
Call:
vars::VAR(y = modes_ts, p = 5, type = "both")


Estimation results for equation heavy_rail_ts: 
============================================== 
heavy_rail_ts = heavy_rail_ts.l1 + light_rail_ts.l1 + commuter_rail_ts.l1 + trolleybus_ts.l1 + bus_ts.l1 + demand_response_ts.l1 + heavy_rail_ts.l2 + light_rail_ts.l2 + commuter_rail_ts.l2 + trolleybus_ts.l2 + bus_ts.l2 + demand_response_ts.l2 + heavy_rail_ts.l3 + light_rail_ts.l3 + commuter_rail_ts.l3 + trolleybus_ts.l3 + bus_ts.l3 + demand_response_ts.l3 + heavy_rail_ts.l4 + light_rail_ts.l4 + commuter_rail_ts.l4 + trolleybus_ts.l4 + bus_ts.l4 + demand_response_ts.l4 + heavy_rail_ts.l5 + light_rail_ts.l5 + commuter_rail_ts.l5 + trolleybus_ts.l5 + bus_ts.l5 + demand_response_ts.l5 + const + trend 

                        Estimate Std. Error t value Pr(>|t|)    
heavy_rail_ts.l1       3.205e-01  4.576e-01   0.700 0.485431    
light_rail_ts.l1      -2.796e+00  2.463e+00  -1.135 0.259347    
commuter_rail_ts.l1    7.501e+00  3.752e+00   1.999 0.048512 *  
trolleybus_ts.l1       1.859e+01  4.635e+00   4.010 0.000124 ***
bus_ts.l1             -6.408e-01  2.421e-01  -2.647 0.009550 ** 
demand_response_ts.l1 -7.260e-01  3.951e+00  -0.184 0.854614    
heavy_rail_ts.l2       4.352e-01  5.022e-01   0.867 0.388431    
light_rail_ts.l2      -2.034e-01  2.685e+00  -0.076 0.939786    
commuter_rail_ts.l2   -5.814e+00  4.258e+00  -1.366 0.175416    
trolleybus_ts.l2      -6.566e+00  5.627e+00  -1.167 0.246242    
bus_ts.l2              5.838e-01  2.710e-01   2.154 0.033855 *  
demand_response_ts.l2 -3.564e+00  4.659e+00  -0.765 0.446216    
heavy_rail_ts.l3      -4.171e-01  4.798e-01  -0.869 0.386876    
light_rail_ts.l3       5.343e+00  2.792e+00   1.913 0.058822 .  
commuter_rail_ts.l3   -2.224e+00  4.275e+00  -0.520 0.604110    
trolleybus_ts.l3       3.280e+00  5.797e+00   0.566 0.572907    
bus_ts.l3              1.387e-01  2.549e-01   0.544 0.587768    
demand_response_ts.l3  3.693e+00  4.440e+00   0.832 0.407595    
heavy_rail_ts.l4       3.583e-01  4.798e-01   0.747 0.457159    
light_rail_ts.l4       3.293e+00  2.844e+00   1.158 0.249958    
commuter_rail_ts.l4   -8.570e+00  4.232e+00  -2.025 0.045766 *  
trolleybus_ts.l4      -2.538e+00  5.857e+00  -0.433 0.665816    
bus_ts.l4              2.338e-01  2.297e-01   1.018 0.311369    
demand_response_ts.l4  3.137e+00  4.419e+00   0.710 0.479607    
heavy_rail_ts.l5       2.432e-01  4.569e-01   0.532 0.595764    
light_rail_ts.l5       1.279e+00  2.579e+00   0.496 0.621180    
commuter_rail_ts.l5   -7.663e-01  4.107e+00  -0.187 0.852386    
trolleybus_ts.l5       2.555e+00  5.779e+00   0.442 0.659498    
bus_ts.l5             -1.112e-01  2.125e-01  -0.523 0.602035    
demand_response_ts.l5 -6.305e+00  3.566e+00  -1.768 0.080319 .  
const                 -1.535e+05  2.577e+05  -0.596 0.552964    
trend                  9.890e+02  1.838e+03   0.538 0.591880    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 65760 on 92 degrees of freedom
Multiple R-Squared: 0.9082, Adjusted R-squared: 0.8773 
F-statistic: 29.37 on 31 and 92 DF,  p-value: < 2.2e-16 


Estimation results for equation light_rail_ts: 
============================================== 
light_rail_ts = heavy_rail_ts.l1 + light_rail_ts.l1 + commuter_rail_ts.l1 + trolleybus_ts.l1 + bus_ts.l1 + demand_response_ts.l1 + heavy_rail_ts.l2 + light_rail_ts.l2 + commuter_rail_ts.l2 + trolleybus_ts.l2 + bus_ts.l2 + demand_response_ts.l2 + heavy_rail_ts.l3 + light_rail_ts.l3 + commuter_rail_ts.l3 + trolleybus_ts.l3 + bus_ts.l3 + demand_response_ts.l3 + heavy_rail_ts.l4 + light_rail_ts.l4 + commuter_rail_ts.l4 + trolleybus_ts.l4 + bus_ts.l4 + demand_response_ts.l4 + heavy_rail_ts.l5 + light_rail_ts.l5 + commuter_rail_ts.l5 + trolleybus_ts.l5 + bus_ts.l5 + demand_response_ts.l5 + const + trend 

                        Estimate Std. Error t value Pr(>|t|)    
heavy_rail_ts.l1      -1.912e-02  5.020e-02  -0.381 0.704109    
light_rail_ts.l1       1.334e-01  2.703e-01   0.494 0.622789    
commuter_rail_ts.l1    9.833e-01  4.116e-01   2.389 0.018939 *  
trolleybus_ts.l1       2.013e+00  5.085e-01   3.959 0.000148 ***
bus_ts.l1             -6.657e-02  2.656e-02  -2.507 0.013946 *  
demand_response_ts.l1 -1.338e-01  4.335e-01  -0.309 0.758282    
heavy_rail_ts.l2       7.118e-02  5.510e-02   1.292 0.199705    
light_rail_ts.l2      -1.437e-01  2.946e-01  -0.488 0.626913    
commuter_rail_ts.l2   -6.915e-01  4.671e-01  -1.480 0.142177    
trolleybus_ts.l2      -1.341e+00  6.173e-01  -2.172 0.032399 *  
bus_ts.l2              6.959e-02  2.974e-02   2.340 0.021435 *  
demand_response_ts.l2 -5.320e-01  5.111e-01  -1.041 0.300664    
heavy_rail_ts.l3      -2.737e-02  5.264e-02  -0.520 0.604321    
light_rail_ts.l3       5.482e-01  3.064e-01   1.789 0.076834 .  
commuter_rail_ts.l3   -3.068e-01  4.690e-01  -0.654 0.514591    
trolleybus_ts.l3       2.117e-01  6.359e-01   0.333 0.739963    
bus_ts.l3              2.230e-02  2.797e-02   0.797 0.427348    
demand_response_ts.l3  4.217e-01  4.871e-01   0.866 0.388818    
heavy_rail_ts.l4      -2.329e-02  5.264e-02  -0.442 0.659271    
light_rail_ts.l4       5.952e-01  3.120e-01   1.908 0.059552 .  
commuter_rail_ts.l4   -4.357e-01  4.643e-01  -0.938 0.350552    
trolleybus_ts.l4      -8.444e-01  6.426e-01  -1.314 0.192064    
bus_ts.l4              9.618e-03  2.520e-02   0.382 0.703555    
demand_response_ts.l4  3.214e-01  4.849e-01   0.663 0.509014    
heavy_rail_ts.l5       1.131e-01  5.013e-02   2.256 0.026426 *  
light_rail_ts.l5       9.032e-02  2.830e-01   0.319 0.750331    
commuter_rail_ts.l5   -7.972e-01  4.505e-01  -1.769 0.080152 .  
trolleybus_ts.l5      -5.413e-01  6.341e-01  -0.854 0.395455    
bus_ts.l5              1.119e-02  2.332e-02   0.480 0.632476    
demand_response_ts.l5 -6.958e-01  3.912e-01  -1.779 0.078614 .  
const                  7.928e+03  2.828e+04   0.280 0.779812    
trend                 -1.063e+02  2.017e+02  -0.527 0.599579    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 7214 on 92 degrees of freedom
Multiple R-Squared: 0.9539, Adjusted R-squared: 0.9383 
F-statistic: 61.36 on 31 and 92 DF,  p-value: < 2.2e-16 


Estimation results for equation commuter_rail_ts: 
================================================= 
commuter_rail_ts = heavy_rail_ts.l1 + light_rail_ts.l1 + commuter_rail_ts.l1 + trolleybus_ts.l1 + bus_ts.l1 + demand_response_ts.l1 + heavy_rail_ts.l2 + light_rail_ts.l2 + commuter_rail_ts.l2 + trolleybus_ts.l2 + bus_ts.l2 + demand_response_ts.l2 + heavy_rail_ts.l3 + light_rail_ts.l3 + commuter_rail_ts.l3 + trolleybus_ts.l3 + bus_ts.l3 + demand_response_ts.l3 + heavy_rail_ts.l4 + light_rail_ts.l4 + commuter_rail_ts.l4 + trolleybus_ts.l4 + bus_ts.l4 + demand_response_ts.l4 + heavy_rail_ts.l5 + light_rail_ts.l5 + commuter_rail_ts.l5 + trolleybus_ts.l5 + bus_ts.l5 + demand_response_ts.l5 + const + trend 

                        Estimate Std. Error t value Pr(>|t|)    
heavy_rail_ts.l1      -5.594e-02  5.593e-02  -1.000 0.319816    
light_rail_ts.l1      -2.320e-01  3.011e-01  -0.771 0.442876    
commuter_rail_ts.l1    1.640e+00  4.585e-01   3.577 0.000556 ***
trolleybus_ts.l1       3.057e+00  5.665e-01   5.395 5.29e-07 ***
bus_ts.l1             -8.787e-02  2.959e-02  -2.970 0.003802 ** 
demand_response_ts.l1 -6.207e-02  4.829e-01  -0.129 0.898019    
heavy_rail_ts.l2       4.220e-02  6.139e-02   0.687 0.493538    
light_rail_ts.l2      -7.160e-02  3.281e-01  -0.218 0.827759    
commuter_rail_ts.l2   -6.855e-01  5.204e-01  -1.317 0.190992    
trolleybus_ts.l2      -1.016e+00  6.877e-01  -1.477 0.143082    
bus_ts.l2              8.206e-02  3.313e-02   2.477 0.015070 *  
demand_response_ts.l2 -5.392e-01  5.694e-01  -0.947 0.346093    
heavy_rail_ts.l3      -5.792e-02  5.864e-02  -0.988 0.325884    
light_rail_ts.l3       5.571e-01  3.413e-01   1.632 0.106017    
commuter_rail_ts.l3   -1.590e-01  5.225e-01  -0.304 0.761569    
trolleybus_ts.l3       5.069e-01  7.085e-01   0.716 0.476081    
bus_ts.l3              2.377e-02  3.116e-02   0.763 0.447519    
demand_response_ts.l3  5.819e-01  5.426e-01   1.072 0.286366    
heavy_rail_ts.l4      -1.948e-02  5.864e-02  -0.332 0.740538    
light_rail_ts.l4       4.770e-01  3.476e-01   1.372 0.173309    
commuter_rail_ts.l4   -5.485e-01  5.173e-01  -1.060 0.291733    
trolleybus_ts.l4      -6.608e-01  7.158e-01  -0.923 0.358388    
bus_ts.l4              1.092e-02  2.807e-02   0.389 0.698133    
demand_response_ts.l4  6.007e-01  5.402e-01   1.112 0.269000    
heavy_rail_ts.l5       1.083e-01  5.584e-02   1.940 0.055415 .  
light_rail_ts.l5       5.731e-02  3.153e-01   0.182 0.856160    
commuter_rail_ts.l5   -5.388e-01  5.019e-01  -1.074 0.285854    
trolleybus_ts.l5      -4.090e-02  7.064e-01  -0.058 0.953948    
bus_ts.l5             -6.513e-03  2.598e-02  -0.251 0.802581    
demand_response_ts.l5 -1.102e+00  4.358e-01  -2.529 0.013147 *  
const                 -1.532e+04  3.150e+04  -0.486 0.627991    
trend                  1.036e+02  2.247e+02   0.461 0.645710    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 8037 on 92 degrees of freedom
Multiple R-Squared: 0.9063, Adjusted R-squared: 0.8747 
F-statistic:  28.7 on 31 and 92 DF,  p-value: < 2.2e-16 


Estimation results for equation trolleybus_ts: 
============================================== 
trolleybus_ts = heavy_rail_ts.l1 + light_rail_ts.l1 + commuter_rail_ts.l1 + trolleybus_ts.l1 + bus_ts.l1 + demand_response_ts.l1 + heavy_rail_ts.l2 + light_rail_ts.l2 + commuter_rail_ts.l2 + trolleybus_ts.l2 + bus_ts.l2 + demand_response_ts.l2 + heavy_rail_ts.l3 + light_rail_ts.l3 + commuter_rail_ts.l3 + trolleybus_ts.l3 + bus_ts.l3 + demand_response_ts.l3 + heavy_rail_ts.l4 + light_rail_ts.l4 + commuter_rail_ts.l4 + trolleybus_ts.l4 + bus_ts.l4 + demand_response_ts.l4 + heavy_rail_ts.l5 + light_rail_ts.l5 + commuter_rail_ts.l5 + trolleybus_ts.l5 + bus_ts.l5 + demand_response_ts.l5 + const + trend 

                        Estimate Std. Error t value Pr(>|t|)   
heavy_rail_ts.l1       3.600e-03  1.345e-02   0.268  0.78950   
light_rail_ts.l1      -2.069e-02  7.239e-02  -0.286  0.77571   
commuter_rail_ts.l1    7.674e-02  1.102e-01   0.696  0.48815   
trolleybus_ts.l1       4.434e-01  1.362e-01   3.256  0.00158 **
bus_ts.l1             -6.055e-03  7.114e-03  -0.851  0.39690   
demand_response_ts.l1  6.218e-02  1.161e-01   0.536  0.59356   
heavy_rail_ts.l2       5.706e-03  1.476e-02   0.387  0.69993   
light_rail_ts.l2       4.062e-02  7.890e-02   0.515  0.60789   
commuter_rail_ts.l2   -1.196e-01  1.251e-01  -0.956  0.34162   
trolleybus_ts.l2      -1.337e-01  1.653e-01  -0.808  0.42092   
bus_ts.l2              8.436e-03  7.965e-03   1.059  0.29229   
demand_response_ts.l2 -9.720e-02  1.369e-01  -0.710  0.47951   
heavy_rail_ts.l3      -2.161e-02  1.410e-02  -1.533  0.12872   
light_rail_ts.l3      -4.320e-02  8.206e-02  -0.526  0.59981   
commuter_rail_ts.l3    1.922e-01  1.256e-01   1.530  0.12941   
trolleybus_ts.l3       5.583e-02  1.703e-01   0.328  0.74384   
bus_ts.l3              3.026e-03  7.491e-03   0.404  0.68723   
demand_response_ts.l3  6.576e-02  1.305e-01   0.504  0.61543   
heavy_rail_ts.l4      -4.735e-03  1.410e-02  -0.336  0.73778   
light_rail_ts.l4       5.124e-02  8.357e-02   0.613  0.54127   
commuter_rail_ts.l4   -1.659e-01  1.244e-01  -1.334  0.18558   
trolleybus_ts.l4      -1.481e-01  1.721e-01  -0.860  0.39186   
bus_ts.l4              1.451e-02  6.749e-03   2.150  0.03419 * 
demand_response_ts.l4  1.336e-01  1.299e-01   1.028  0.30644   
heavy_rail_ts.l5       3.202e-02  1.343e-02   2.385  0.01912 * 
light_rail_ts.l5       2.911e-03  7.580e-02   0.038  0.96945   
commuter_rail_ts.l5   -9.714e-02  1.207e-01  -0.805  0.42291   
trolleybus_ts.l5       1.933e-01  1.698e-01   1.138  0.25802   
bus_ts.l5             -1.631e-02  6.245e-03  -2.611  0.01053 * 
demand_response_ts.l5 -2.071e-01  1.048e-01  -1.976  0.05113 . 
const                  1.612e+04  7.574e+03   2.128  0.03600 * 
trend                 -1.071e+02  5.402e+01  -1.983  0.05035 . 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 1932 on 92 degrees of freedom
Multiple R-Squared: 0.9329, Adjusted R-squared: 0.9103 
F-statistic: 41.25 on 31 and 92 DF,  p-value: < 2.2e-16 


Estimation results for equation bus_ts: 
======================================= 
bus_ts = heavy_rail_ts.l1 + light_rail_ts.l1 + commuter_rail_ts.l1 + trolleybus_ts.l1 + bus_ts.l1 + demand_response_ts.l1 + heavy_rail_ts.l2 + light_rail_ts.l2 + commuter_rail_ts.l2 + trolleybus_ts.l2 + bus_ts.l2 + demand_response_ts.l2 + heavy_rail_ts.l3 + light_rail_ts.l3 + commuter_rail_ts.l3 + trolleybus_ts.l3 + bus_ts.l3 + demand_response_ts.l3 + heavy_rail_ts.l4 + light_rail_ts.l4 + commuter_rail_ts.l4 + trolleybus_ts.l4 + bus_ts.l4 + demand_response_ts.l4 + heavy_rail_ts.l5 + light_rail_ts.l5 + commuter_rail_ts.l5 + trolleybus_ts.l5 + bus_ts.l5 + demand_response_ts.l5 + const + trend 

                        Estimate Std. Error t value Pr(>|t|)    
heavy_rail_ts.l1      -2.831e-01  4.141e-01  -0.684  0.49598    
light_rail_ts.l1      -5.894e+00  2.229e+00  -2.644  0.00964 ** 
commuter_rail_ts.l1    9.030e+00  3.395e+00   2.660  0.00923 ** 
trolleybus_ts.l1       1.864e+01  4.195e+00   4.444 2.46e-05 ***
bus_ts.l1              1.417e-01  2.191e-01   0.647  0.51937    
demand_response_ts.l1 -2.122e+00  3.576e+00  -0.593  0.55444    
heavy_rail_ts.l2       5.296e-01  4.545e-01   1.165  0.24698    
light_rail_ts.l2       3.082e+00  2.430e+00   1.268  0.20787    
commuter_rail_ts.l2   -7.631e+00  3.853e+00  -1.980  0.05064 .  
trolleybus_ts.l2      -5.133e+00  5.092e+00  -1.008  0.31606    
bus_ts.l2              6.319e-01  2.453e-01   2.576  0.01158 *  
demand_response_ts.l2 -5.333e+00  4.216e+00  -1.265  0.20910    
heavy_rail_ts.l3      -1.849e-01  4.342e-01  -0.426  0.67125    
light_rail_ts.l3       1.830e+00  2.527e+00   0.724  0.47079    
commuter_rail_ts.l3   -3.106e+00  3.869e+00  -0.803  0.42416    
trolleybus_ts.l3      -6.124e+00  5.246e+00  -1.167  0.24606    
bus_ts.l3              3.632e-01  2.307e-01   1.574  0.11887    
demand_response_ts.l3  6.375e+00  4.018e+00   1.587  0.11599    
heavy_rail_ts.l4       2.758e-02  4.342e-01   0.064  0.94950    
light_rail_ts.l4       1.613e+00  2.574e+00   0.627  0.53247    
commuter_rail_ts.l4   -4.484e+00  3.830e+00  -1.171  0.24472    
trolleybus_ts.l4       1.972e+00  5.300e+00   0.372  0.71073    
bus_ts.l4              2.653e-01  2.079e-01   1.276  0.20511    
demand_response_ts.l4  2.358e+00  4.000e+00   0.590  0.55697    
heavy_rail_ts.l5       6.895e-01  4.135e-01   1.668  0.09880 .  
light_rail_ts.l5       2.319e+00  2.334e+00   0.993  0.32316    
commuter_rail_ts.l5   -3.042e+00  3.717e+00  -0.819  0.41512    
trolleybus_ts.l5      -3.661e+00  5.230e+00  -0.700  0.48574    
bus_ts.l5             -2.449e-01  1.923e-01  -1.273  0.20612    
demand_response_ts.l5 -7.787e+00  3.227e+00  -2.413  0.01780 *  
const                 -6.138e+03  2.332e+05  -0.026  0.97906    
trend                  8.384e+01  1.664e+03   0.050  0.95992    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 59510 on 92 degrees of freedom
Multiple R-Squared: 0.949,  Adjusted R-squared: 0.9319 
F-statistic: 55.26 on 31 and 92 DF,  p-value: < 2.2e-16 


Estimation results for equation demand_response_ts: 
=================================================== 
demand_response_ts = heavy_rail_ts.l1 + light_rail_ts.l1 + commuter_rail_ts.l1 + trolleybus_ts.l1 + bus_ts.l1 + demand_response_ts.l1 + heavy_rail_ts.l2 + light_rail_ts.l2 + commuter_rail_ts.l2 + trolleybus_ts.l2 + bus_ts.l2 + demand_response_ts.l2 + heavy_rail_ts.l3 + light_rail_ts.l3 + commuter_rail_ts.l3 + trolleybus_ts.l3 + bus_ts.l3 + demand_response_ts.l3 + heavy_rail_ts.l4 + light_rail_ts.l4 + commuter_rail_ts.l4 + trolleybus_ts.l4 + bus_ts.l4 + demand_response_ts.l4 + heavy_rail_ts.l5 + light_rail_ts.l5 + commuter_rail_ts.l5 + trolleybus_ts.l5 + bus_ts.l5 + demand_response_ts.l5 + const + trend 

                        Estimate Std. Error t value Pr(>|t|)    
heavy_rail_ts.l1      -1.653e-03  2.319e-02  -0.071 0.943331    
light_rail_ts.l1      -1.310e-01  1.249e-01  -1.049 0.296963    
commuter_rail_ts.l1    1.652e-01  1.901e-01   0.869 0.387285    
trolleybus_ts.l1       7.653e-01  2.349e-01   3.257 0.001575 ** 
bus_ts.l1             -2.326e-02  1.227e-02  -1.896 0.061152 .  
demand_response_ts.l1  7.297e-01  2.003e-01   3.644 0.000444 ***
heavy_rail_ts.l2       4.478e-04  2.546e-02   0.018 0.986004    
light_rail_ts.l2       5.316e-02  1.361e-01   0.391 0.696927    
commuter_rail_ts.l2   -1.478e-01  2.158e-01  -0.685 0.494958    
trolleybus_ts.l2      -3.155e-01  2.852e-01  -1.106 0.271447    
bus_ts.l2              2.009e-02  1.374e-02   1.463 0.146998    
demand_response_ts.l2 -8.379e-02  2.361e-01  -0.355 0.723503    
heavy_rail_ts.l3       1.825e-03  2.432e-02   0.075 0.940325    
light_rail_ts.l3       2.177e-01  1.415e-01   1.538 0.127414    
commuter_rail_ts.l3   -1.444e-01  2.167e-01  -0.667 0.506728    
trolleybus_ts.l3       7.068e-02  2.938e-01   0.241 0.810426    
bus_ts.l3              8.625e-03  1.292e-02   0.668 0.506121    
demand_response_ts.l3  3.182e-02  2.250e-01   0.141 0.887854    
heavy_rail_ts.l4       1.113e-02  2.432e-02   0.458 0.648347    
light_rail_ts.l4       2.952e-01  1.441e-01   2.048 0.043409 *  
commuter_rail_ts.l4   -5.554e-01  2.145e-01  -2.589 0.011186 *  
trolleybus_ts.l4      -1.865e-01  2.968e-01  -0.628 0.531356    
bus_ts.l4              1.848e-02  1.164e-02   1.587 0.115863    
demand_response_ts.l4  1.173e-01  2.240e-01   0.524 0.601840    
heavy_rail_ts.l5       1.275e-03  2.316e-02   0.055 0.956207    
light_rail_ts.l5      -7.858e-02  1.307e-01  -0.601 0.549257    
commuter_rail_ts.l5    1.554e-01  2.081e-01   0.747 0.457239    
trolleybus_ts.l5       3.046e-01  2.929e-01   1.040 0.301041    
bus_ts.l5             -1.236e-02  1.077e-02  -1.147 0.254172    
demand_response_ts.l5 -2.034e-01  1.807e-01  -1.126 0.263193    
const                 -7.772e+03  1.306e+04  -0.595 0.553317    
trend                  5.454e+01  9.317e+01   0.585 0.559703    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 3333 on 92 degrees of freedom
Multiple R-Squared: 0.9496, Adjusted R-squared: 0.9326 
F-statistic: 55.88 on 31 and 92 DF,  p-value: < 2.2e-16 



Covariance matrix of residuals:
                   heavy_rail_ts light_rail_ts commuter_rail_ts trolleybus_ts
heavy_rail_ts          4.324e+09     433085018        509940554      86759512
light_rail_ts          4.331e+08      52045740         54252113       8665785
commuter_rail_ts       5.099e+08      54252113         64593679      10741604
trolleybus_ts          8.676e+07       8665785         10741604       3734007
bus_ts                 3.286e+09     360927753        426042623      79685220
demand_response_ts     1.782e+08      16224187         19272653       2891957
                      bus_ts demand_response_ts
heavy_rail_ts      3.286e+09          178201797
light_rail_ts      3.609e+08           16224187
commuter_rail_ts   4.260e+08           19272653
trolleybus_ts      7.969e+07            2891957
bus_ts             3.541e+09          111164467
demand_response_ts 1.112e+08           11107212

Correlation matrix of residuals:
                   heavy_rail_ts light_rail_ts commuter_rail_ts trolleybus_ts
heavy_rail_ts             1.0000        0.9129           0.9649        0.6828
light_rail_ts             0.9129        1.0000           0.9357        0.6216
commuter_rail_ts          0.9649        0.9357           1.0000        0.6917
trolleybus_ts             0.6828        0.6216           0.6917        1.0000
bus_ts                    0.8398        0.8407           0.8908        0.6929
demand_response_ts        0.8131        0.6748           0.7195        0.4491
                   bus_ts demand_response_ts
heavy_rail_ts      0.8398             0.8131
light_rail_ts      0.8407             0.6748
commuter_rail_ts   0.8908             0.7195
trolleybus_ts      0.6929             0.4491
bus_ts             1.0000             0.5605
demand_response_ts 0.5605             1.0000

The correlation matrices of residuals indicate that we get better results from \(p=1\). Cross validation will provide more information.

Cross Validation

Code
n <- length(modes_ts[,1])
k <- floor(n / 4 / 4) * 4
h <- 4 
n_k <- n - k
q_steps <- n_k / h

rmse1 <- matrix(NA, n_k, 3)
rmse2 <- matrix(NA, n_k, 3)

st <- tsp(modes_ts)[1] + (k - 1) / 4  

# Cross-validation loop for time series forecasting
for (i in 1:q_steps) {  
  # Define training and testing windows
  xtrain <- window(modes_ts, end = st + i - 1)
  xtest <- window(modes_ts, start = st + (i - 1) + 1/4, end = st + i)
  
  ######## First Model (VAR with p=1) ########
  fit1 <- vars::VAR(xtrain, p = 1, type = "both")
  fcast1 <- predict(fit1, n.ahead = 2)
  
  f_heavy <- fcast1$fcst$heavy_rail_ts[, 1]
  f_light <- fcast1$fcst$light_rail_ts[, 1]
  f_commuter  <- fcast1$fcst$commuter_rail_ts[, 1]
  f_trolley <- fcast1$fcst$trolleybus_ts[, 1]
  f_bus <- fcast1$fcst$bus_ts[, 1]
  f_demand <- fcast1$fcst$demand_response_ts[, 1]
  
  ff1 <- data.frame(f_heavy, f_light, f_commuter, f_trolley, f_bus, f_demand)
  
  year <- st + (i - 1) + 1/4

  ff1 <- ts(ff1, start = c(year, 1), frequency = 4)

  a <- min(4 * i - 3)
  b <- min(4 * i)
  
  rmse1[c(a:b), ] <- sqrt((ff1 - xtest) ^ 2)
  
  ######## Second Model (VAR with p=5) ########
  fit2 <- vars::VAR(xtrain, p = 5, type = "both")
  fcast2 <- predict(fit2, n.ahead = 2)
  
  f_heavy2 <- fcast2$fcst$heavy_rail_ts[, 1]
  f_light2 <- fcast2$fcst$light_rail_ts[, 1]
  f_commuter2  <- fcast2$fcst$commuter_rail_ts[, 1]
  f_trolley2 <- fcast2$fcst$trolleybus_ts[, 1]
  f_bus2 <- fcast2$fcst$bus_ts[, 1]
  f_demand2  <- fcast2$fcst$demand_response_ts[, 1]
  
  ff2 <- data.frame(f_heavy2, f_light2, f_commuter2, f_trolley2, f_bus2, f_demand2)
  
  year <- st + (i - 1) + 1/4

  ff2 <- ts(ff2, start = c(year, 1), frequency = 4)
  
  a <- min(4 * i - 3)
  b <- min(4 * i)

  rmse2[c(a:b), ] <- sqrt((ff2 - xtest) ^ 2)
}

print(paste("Model 1 (p=1): ", mean(rmse1, na.rm = TRUE)))
[1] "Model 1 (p=1):  23067.6340150668"
Code
print(paste("Model 5 (p=5): ", mean(rmse2, na.rm = TRUE)))
[1] "Model 5 (p=5):  30816.3833773339"

Cross validation confirms that \(p=1\) does produce better results.

Chosen Model

My chosen model is VAR(1).

Forecasting

Code
fit <- VAR(modes_ts, p = 1, type = "both")

fit_pr <- predict(fit, n.ahead = 12, ci = 0.95)

actual_data <- as.data.frame(modes_ts)
actual_data$Time <- as.numeric(time(modes_ts))

f_heavy <- fit_pr$fcst$heavy_rail_ts[, "fcst"]
f_light <- fit_pr$fcst$light_rail_ts[, "fcst"]
f_commuter <- fit_pr$fcst$commuter_rail_ts[, "fcst"]
f_trolley <- fit_pr$fcst$trolleybus_ts[, "fcst"]
f_bus <- fit_pr$fcst$bus_ts[, "fcst"]
f_demand <- fit_pr$fcst$demand_response_ts[, "fcst"]

heavy_lower <- fit_pr$fcst$heavy_rail_ts[, "lower"]
heavy_upper <- fit_pr$fcst$heavy_rail_ts[, "upper"]

light_lower <- fit_pr$fcst$light_rail_ts[, "lower"]
light_upper <- fit_pr$fcst$light_rail_ts[, "upper"]

commuter_lower <- fit_pr$fcst$commuter_rail_ts[, "lower"]
commuter_upper <- fit_pr$fcst$commuter_rail_ts[, "upper"]

trolley_lower <- fit_pr$fcst$trolleybus_ts[, "lower"]
trolley_upper <- fit_pr$fcst$trolleybus_ts[, "upper"]

bus_lower <- fit_pr$fcst$bus_ts[, "lower"]
bus_upper <- fit_pr$fcst$bus_ts[, "upper"]

demand_lower <- fit_pr$fcst$demand_response_ts[, "lower"]
demand_upper <- fit_pr$fcst$demand_response_ts[, "upper"]

# Create future time points as numeric
time_index <- as.numeric(seq(from = end(modes_ts)[1] + 1, length.out = 12))

# Create data frames for forecasts
forecast_heavy <- data.frame(Time = time_index, Forecast = f_heavy, Lower = heavy_lower, Upper = heavy_upper)
forecast_light <- data.frame(Time = time_index, Forecast = f_light, Lower = light_lower, Upper = light_upper)
forecast_commuter <- data.frame(Time = time_index, Forecast = f_commuter, Lower = commuter_lower, Upper = commuter_upper)
forecast_trolley <- data.frame(Time = time_index, Forecast = f_trolley, Lower = trolley_lower, Upper = trolley_upper)
forecast_bus <- data.frame(Time = time_index, Forecast = f_bus, Lower = bus_lower, Upper = bus_upper)
forecast_demand <- data.frame(Time = time_index, Forecast = f_demand, Lower = demand_lower, Upper = demand_upper)

p1 <- ggplot() +
  geom_line(data = actual_data, aes(x = Time, y = heavy_rail_ts), color = "blue") +
  geom_ribbon(data = forecast_heavy, aes(x = Time, ymin = Lower, ymax = Upper), 
              fill = "blue", alpha = 0.2) +
  geom_line(data = forecast_heavy, aes(x = Time, y = Forecast), color = "blue", linetype = "dashed") +
  labs(title = "Heavy Rail Forecast", x = "Year", y = "Heavy Rail Ridership") +
  theme_minimal()

p2 <- ggplot() +
  geom_line(data = actual_data, aes(x = Time, y = light_rail_ts), color = "red") +
  geom_ribbon(data = forecast_light, aes(x = Time, ymin = Lower, ymax = Upper), 
              fill = "red", alpha = 0.2) +
  geom_line(data = forecast_light, aes(x = Time, y = Forecast), color = "red", linetype = "dashed") +
  labs(title = "Light Rail Forecast", x = "Year", y = "Light Rail Ridership") +
  theme_minimal()

p3 <- ggplot() +
  geom_line(data = actual_data, aes(x = Time, y = commuter_rail_ts), color = "green") +
  geom_ribbon(data = forecast_commuter, aes(x = Time, ymin = Lower, ymax = Upper), 
              fill = "green", alpha = 0.2) +
  geom_line(data = forecast_commuter, aes(x = Time, y = Forecast), color = "green", linetype = "dashed") +
  labs(title = "Commuter Rail Forecast", x = "Year", y = "Commuter Rail Ridership") +
  theme_minimal()

p4 <- ggplot() +
  geom_line(data = actual_data, aes(x = Time, y = trolleybus_ts), color = "purple") +
  geom_ribbon(data = forecast_trolley, aes(x = Time, ymin = Lower, ymax = Upper), 
              fill = "purple", alpha = 0.2) +
  geom_line(data = forecast_trolley, aes(x = Time, y = Forecast), color = "purple", linetype = "dashed") +
  labs(title = "Trolleybus Forecast", x = "Year", y = "Trolleybus Ridership") +
  theme_minimal()

p5 <- ggplot() +
  geom_line(data = actual_data, aes(x = Time, y = bus_ts), color = "orange") +
  geom_ribbon(data = forecast_bus, aes(x = Time, ymin = Lower, ymax = Upper), 
              fill = "orange", alpha = 0.2) +
  geom_line(data = forecast_bus, aes(x = Time, y = Forecast), color = "orange", linetype = "dashed") +
  labs(title = "Bus Forecast", x = "Year", y = "Bus Ridership") +
  theme_minimal()

p6 <- ggplot() +
  geom_line(data = actual_data, aes(x = Time, y = demand_response_ts), color = "brown") +
  geom_ribbon(data = forecast_demand, aes(x = Time, ymin = Lower, ymax = Upper), 
              fill = "brown", alpha = 0.2) +
  geom_line(data = forecast_demand, aes(x = Time, y = Forecast), color = "brown", linetype = "dashed") +
  labs(title = "Demand Response Forecast", x = "Year", y = "Demand Response Ridership") +
  theme_minimal()

# Arrange plots using patchwork
(p1 / p2) / p3

Code
(p4 / p5) / p6

These forecasts all tell roughly the same thing: we can expect a slight decline in the coming years for each mode of public transportation. The 95% confidence bands indicate that these are somewhat confident predictions, although may be misinformed due to a great deal of the movement in recent years coming from the COVID-19 pandemic. From these models, we can conclude that the modes are roughly correlated with one another, although we do see variation especially from trolleybus and buses which appear more correlated with each other than the remaining modes.

Model 3:

In this model, we will attempt to create statistics and forecasts that can be measured against univariate models fit against public transit ridership at a national level. In this case, the exogenous variables will be vehicle miles, gas prices, and unemployment rate. The purpose of using these is to contextualize public transit with other transportation options, as well as demand of transportation. Vehicle miles and gas prices serve as proxies for people’s inclination to travel via car, while unemployment rate relates to the number of people who may need to commute. These three exogenous variables are gathered at monthly rates, so averaging will be used to get quarterly values that match ridership data.

Variables

ARIMAX: Ridership ~ Vehicle Miles + Gas Prices + Unemployment Rate

Question: To what extent is public transit ridership impacted by larger macroeconomic phenomena?

Model Fitting

Code
ridership <- read_csv("../data/ridership.csv")
monthly <- read_csv("../data/monthly_data.csv")
miles <- read_csv("../data/monthly_miles.csv")
gas <- read_csv("../data/gas_prices.csv")
unemp <- read_csv("../data/unemployment.csv")
ridership$Date <- as.Date(as.yearqtr(ridership$`Quarter-Year`, format = "%Y - Q%q"))
Code
miles$observation_date <- as.Date(miles$observation_date, format = "%m/%d/%y")
miles <- miles[miles$observation_date >= as.Date("1993-04-01"),]
miles <- miles[miles$observation_date <= as.Date("2024-01-01"),]
miles$quarter <- as.yearqtr(miles$observation_date)
miles <- miles %>%
  group_by(quarter) %>%
  summarise(Miles = mean(M12MTVUSM227NFWA, na.rm = TRUE))
gas$Month <- as.Date(paste0(gas$Month, "-01"), format = "%b-%y-%d")
gas <- gas[gas$Month >= as.Date("1993-04-01"),]
gas <- gas[gas$Month <= as.Date("2024-01-01"),]
gas <- gas[order(gas$Month), ]
gas$quarter <- as.yearqtr(gas$Month)
gas <- gas %>%
  group_by(quarter) %>%
  summarise(Gas = mean(`Price per Gallon`, na.rm = TRUE))
unemp <- unemp[unemp$observation_date >= as.Date("1993-04-01"),]
unemp <- unemp[unemp$observation_date <= as.Date("2024-01-01"),]
unemp$quarter <- as.yearqtr(unemp$observation_date)
unemp <- unemp %>%
  group_by(quarter) %>%
  summarise(Unemployment = mean(UNRATE, na.rm = TRUE))
ridership <- ridership[ridership$Date >= as.Date("1993-04-01"),]
ridership <- ridership[ridership$Date <= as.Date("2024-01-01"),]
ridership$vehicle_miles <- miles$Miles
ridership$gas_prices <- gas$Gas
ridership$unemployment_rate <- unemp$Unemployment
Code
write_csv(ridership, "../data/quarterly_combined.csv")
Code
p1 <- plot_ly(ridership, x = ~Date, y = ~`Total Ridership (000s)`, type = 'scatter', mode = 'lines', name = "Passenger Trips") %>%
  layout(yaxis = list(title = "Trips"))

p2 <- plot_ly(ridership, x = ~Date, y = ~vehicle_miles, type = 'scatter', mode = 'lines', name = "Vehicle Miles") %>%
  layout(yaxis = list(title = "Miles"))

p3 <- plot_ly(ridership, x = ~Date, y = ~gas_prices, type = 'scatter', mode = 'lines', name = "Gas Prices") %>%
  layout(yaxis = list(title = "Price"))

p4 <- plot_ly(ridership, x = ~Date, y = ~unemployment_rate, type = 'scatter', mode = 'lines', name = "Unemployment Rate") %>%
  layout(yaxis = list(title = "Rate"))

final_plot <- subplot(p1, p2, p3, p4, nrows = 4, shareX = TRUE, titleX = TRUE) %>%
  layout(
    title = "Time Series of Several Variables",
    yaxis = list(title = "Trips"), 
    yaxis2 = list(title = "Miles"), 
    yaxis3 = list(title = "Price"),
    yaxis4 = list(title = "Rate"),
    xaxis = list(title = "Date")
  )

final_plot

This plot shows that each of these variables were affected by the events in 2020, to varying degrees, and have had time to settle in the years since. The relationships are not immediately obvious, but those should be uncovered by fitting a model.

Code
df.ts <- ts(ridership[, -1], start = c(1993, 4), frequency = 4)
y <- df.ts[, "Total Ridership (000s)"]
xreg <- df.ts[, c("vehicle_miles", "gas_prices", "unemployment_rate")]
fit <- auto.arima(y, xreg = xreg)

summary(fit)
Series: y 
Regression with ARIMA(0,1,0)(1,0,1)[4] errors 

Coefficients:
        sar1     sma1  vehicle_miles  gas_prices  unemployment_rate
      0.9581  -0.7228         1.3404    25464.91        -127842.779
s.e.  0.0371   0.1132         0.2178    25081.27           7045.816

sigma^2 = 4.499e+09:  log likelihood = -1540.7
AIC=3093.4   AICc=3094.13   BIC=3110.27

Training set error measures:
                   ME     RMSE      MAE        MPE     MAPE      MASE      ACF1
Training set -1841.28 65428.32 49521.02 -0.2359171 2.539576 0.3710027 0.1905822
Code
checkresiduals(fit)


    Ljung-Box test

data:  Residuals from Regression with ARIMA(0,1,0)(1,0,1)[4] errors
Q* = 30.425, df = 6, p-value = 3.263e-05

Model df: 2.   Total lags used: 8
Code
res.fit <- ts(residuals(fit), start = decimal_date(as.Date("1993-04-01", format = "%Y-%m-%d")), frequency = 4)
Code
SARIMA.c <- function(p_range, d_range, q_range, P_range, D_range, Q_range, data) {
  s <- 12
  results_list <- list()

  for (p in p_range) {
    for (d in d_range) {
      for (q in q_range) {
        for (P in P_range) {
          for (D in D_range) {
            for (Q in Q_range) {
              if (p + d + q + P + D + Q <= 10) {
                tryCatch({
                  # Fit SARIMA model
                  model <- Arima(data, order = c(p, d, q), seasonal = c(P, D, Q))
                  
                  # Store results
                  results_list[[length(results_list) + 1]] <- c(p, d, q, P, D, Q, model$aic, model$bic, model$aicc)
                }, error = function(e) {
                  # Handle errors without breaking the loop
                  cat("Model failed for (p=", p, ", d=", d, ", q=", q, ", P=", P, ", D=", D, ", Q=", Q, ")\n")
                })
              }
            }
          }
        }
      }
    }
  }
  
  results_df <- as.data.frame(do.call(rbind, results_list))
  colnames(results_df) <- c("p", "d", "q", "P", "D", "Q", "AIC", "BIC", "AICc")
  
  return(results_df)
}

output <- SARIMA.c(
  p_range = 0:3, q_range = 0:3, 
  d_range = 0:1, D_range = 0:1, 
  P_range = 0, Q_range = 1:2, 
  data = res.fit
)

minaic <- output[which.min(output$AIC), ]
minbic <- output[which.min(output$BIC), ]

print(minaic)
   p d q P D Q      AIC      BIC     AICc
56 1 1 1 0 1 2 2983.791 2997.687 2984.322
Code
model_output_1 <- capture.output(sarima(res.fit, 0, 1, 0, 1, 0, 1, 4))

Code
model_output_2 <- capture.output(sarima(res.fit, 1, 1, 1, 0, 1, 2, 4))

Chosen Model

Code
y <- df.ts[, "Total Ridership (000s)"]

xreg <- df.ts[, c("vehicle_miles", "gas_prices", "unemployment_rate")]

miles_fit <- auto.arima(xreg[, "vehicle_miles"])
fmiles <- forecast(miles_fit, h = 20)

gas_fit <- auto.arima(xreg[, "gas_prices"])
fgas <- forecast(gas_fit, h = 20)

unemp_fit <- auto.arima(xreg[, "unemployment_rate"])
funemp <- forecast(unemp_fit, h = 20)
Code
fxreg <- cbind(VehicleMiles = fmiles$mean, 
               GasPrices = fgas$mean, 
               UnemploymentRate = funemp$mean)

fit <- Arima(y, order = c(1,1,1), seasonal = list(order = c(0,1,2), period = 4), xreg = xreg)
summary(fit)
Series: y 
Regression with ARIMA(1,1,1)(0,1,2)[4] errors 

Coefficients:
         ar1      ma1     sma1    sma2  vehicle_miles  gas_prices
      0.8483  -0.5294  -0.9724  0.1422         1.0509    53043.62
s.e.  0.0858   0.1181   0.1147  0.1153         0.2443    22979.42
      unemployment_rate
            -132741.347
s.e.           5789.683

sigma^2 = 3.824e+09:  log likelihood = -1480.4
AIC=2976.8   AICc=2978.11   BIC=2999.03

Training set error measures:
                  ME     RMSE      MAE       MPE     MAPE      MASE        ACF1
Training set 1707.84 58767.78 44100.25 0.0480734 2.223013 0.3303912 -0.08269953

Chosen Model: ARIMA(1,1,1)(0,1,2)[4]

RMSE: \(58,767.78\)

Compared to Univariate RMSE: \(156,042.40\)

Equation: \[y_t = 0.8483\, y_{t-1} - 0.5294\, \epsilon_{t-1} - 0.9724\, \epsilon_{t-12} + 0.1422\, \epsilon_{t-24} + 1.0509\, x_{1,t} + 53043.62\, x_{2,t} - 132741.347\, x_{3,t} + \epsilon_t\], where \(x_1\) is vehicle miles, \(x_2\) is gas prices, and \(x_3\) is unemployment rate.

Forecasting

Code
fcast <- forecast(fit, xreg = fxreg, h = 20)
Warning in forecast.forecast_ARIMA(fit, xreg = fxreg, h = 20): xreg contains
different column names from the xreg used in training. Please check that the
regressors are in the same order.
Code
autoplot(fcast) +
  ggtitle("Forecast of Ridership for the Next 20 Quarters (5 Years)") +
  xlab("Year") +
  ylab("Total Ridership") +
  theme_minimal()

In contrast with what we got from univariate modeling, this forecast predicts that ridership will gradually increase over the next five years, while maintaining the same seasonality we’ve been seeing. Due to the significantly lower RMSE value here, this is a more reliable prediction, despite still having noticeably imprecise confidence bands. This tells us that the exogenous variables are indeed useful to include as predictors in addition to past ridership values. What’s also notable is that this upward trend is clearly less steep than what we’ve seen in the post-pandemic time window, indicating a prediction that the recovery will somewhat ease up.

Model 4:

This model will use the same variables as the previous one, but instead use vehicle miles as the target variable with public transit ridership as an exogenous variable. This will allow us to consider a different transportation method as a predictor, and see if people’s inclination to drive a car is impacted by the same factors. A plot in the previous section shows these variables concurrently to display any immediate relationships that may be visible.

Variables

ARIMAX: Vehicle Miles ~ Public Transit Ridership + Gas Prices + Unemployment Rate

Question: To what extent is automobile usage impacted by larger macroeconomic phenomena?

Model Fitting

Code
df.ts2 <- ts(ridership[, -1], start = c(1993, 4), frequency = 4)
y2 <- df.ts2[, "vehicle_miles"]
xreg2 <- df.ts2[, c("Total Ridership (000s)", "gas_prices", "unemployment_rate")]
fit2 <- auto.arima(y2, xreg = xreg2)

summary(fit2)
Series: y2 
Regression with ARIMA(1,1,0)(2,0,0)[4] errors 

Coefficients:
         ar1     sar1     sar2     drift  Total Ridership (000s)  gas_prices
      0.5874  -0.4822  -0.1698  7860.491                  0.0218    1403.350
s.e.  0.0796   0.1051   0.0970  2562.468                  0.0126    4843.757
      unemployment_rate
              -3733.194
s.e.           1904.616

sigma^2 = 3.93e+08:  log likelihood = -1388.6
AIC=2793.2   AICc=2794.46   BIC=2815.7

Training set error measures:
                   ME     RMSE      MAE         MPE      MAPE      MASE
Training set 28.78505 19173.08 9533.784 0.008514815 0.3240511 0.1619729
                    ACF1
Training set -0.03945645
Code
checkresiduals(fit2)


    Ljung-Box test

data:  Residuals from Regression with ARIMA(1,1,0)(2,0,0)[4] errors
Q* = 2.033, df = 5, p-value = 0.8446

Model df: 3.   Total lags used: 8
Code
res.fit2 <- ts(residuals(fit2), start = decimal_date(as.Date("1993-04-01", format = "%Y-%m-%d")), frequency = 4)
Code
output2 <- SARIMA.c(
  p_range = 0:3, q_range = 0:3, 
  d_range = 0:1, D_range = 0:1, 
  P_range = 0, Q_range = 1:2, 
  data = res.fit2
)
Model failed for (p= 3 , d= 1 , q= 0 , P= 0 , D= 0 , Q= 1 )
Model failed for (p= 3 , d= 1 , q= 0 , P= 0 , D= 0 , Q= 2 )
Model failed for (p= 3 , d= 1 , q= 1 , P= 0 , D= 0 , Q= 2 )
Code
minaic2 <- output[which.min(output2$AIC), ]
minbic2 <- output[which.min(output2$BIC), ]

print(minaic2)
   p d q P D Q      AIC      BIC     AICc
23 0 1 1 0 1 1 2991.802 3000.139 2992.011
Code
model_output_3 <- capture.output(sarima(res.fit2, 1, 1, 0, 2, 0, 0, 4))

Code
model_output_4 <- capture.output(sarima(res.fit2, 0, 1, 1, 0, 1, 1, 4))

Chosen Model

Code
y2 <- df.ts2[, "vehicle_miles"]

xreg2 <- df.ts2[, c("Total Ridership (000s)", "gas_prices", "unemployment_rate")]

rider_fit2 <- auto.arima(xreg2[, "Total Ridership (000s)"])
frider2 <- forecast(rider_fit2, h = 20)

gas_fit2 <- auto.arima(xreg2[, "gas_prices"])
fgas2 <- forecast(gas_fit2, h = 20)

unemp_fit2 <- auto.arima(xreg2[, "unemployment_rate"])
funemp2 <- forecast(unemp_fit2, h = 20)
Code
fxreg2 <- cbind(Ridership = frider2$mean, 
               GasPrices = fgas2$mean, 
               UnemploymentRate = funemp2$mean)

fit2 <- Arima(y2, order = c(1,1,0), seasonal = list(order = c(2,0,0), period = 4), xreg = xreg2)
summary(fit2)
Series: y2 
Regression with ARIMA(1,1,0)(2,0,0)[4] errors 

Coefficients:
         ar1     sar1     sar2  Total Ridership (000s)  gas_prices
      0.6747  -0.4669  -0.1504                  0.0191    1867.510
s.e.  0.0759   0.1118   0.1023                  0.0124    4881.099
      unemployment_rate
              -4081.588
s.e.           1914.553

sigma^2 = 412609788:  log likelihood = -1392.14
AIC=2798.28   AICc=2799.26   BIC=2817.97

Training set error measures:
                   ME     RMSE     MAE       MPE     MAPE      MASE       ACF1
Training set 4112.976 19731.13 10151.7 0.1495027 0.348495 0.1724709 -0.1209944

The chosen model for this is ARIMA(1,1,0)(2,0,0)[4]

RMSE: 19731.13

Compared to univariate RMSE: 156042.4

Equation: \[y_t = 0.6747\, y_{t-1} - 0.4669\, y_{t-12} - 0.1504\, y_{t-24} + 0.0191\, x_{1,t} + 1867.510\, x_{2,t} - 4081.588\, x_{3,t} + \epsilon_t\] where \(x_1\) is public transit ridership, \(x_2\) is gas prices, and \(x_3\) is unemployment rate.

Forecasting

Code
fcast2 <- forecast(fit2, xreg = fxreg2, h = 20)

autoplot(fcast2) +
  ggtitle("Forecast of Vehicle Miles for the Next 20 Quarters (5 Years)") +
  xlab("Year") +
  ylab("Million Vehicle Miles") +
  theme_minimal()

This forecast is a bit less telling than that for public transit ridership. We simply see a constant line that extends without much variation over the next five years with very wide confidence bands. However, the RMSE is significantly less than that of our univariate model, indicating that the inclusion of exogenous variables is indeed useful.

Model 5:

In this model, we will evaluate the relationships between the top five cities in the U.S. in public transit ridership. The purpose of this is to compare trends within cities interact with one another, and whether some cities are predictive of others.

Variables

VAR: Ridership of top 5 cities in the U.S.

Question: To what extent are trends in some U.S. metropolitan areas predictive of others?

Model Fitting

Code
cities <- read_csv("../data/monthly_data.csv")
cities$Month <- as.Date(cities$Month, format = "%m/%d/%y")
cities <- cities[order(cities$Month),]
nyc_ts <- ts(cities$`UPT - New York--Jersey City--Newark, NY--NJ`, start=c(2002,1), frequency = 12)
la_ts <- ts(cities$`UPT - Los Angeles--Long Beach--Anaheim, CA`, start=c(2002,1), frequency = 12)
chi_ts <- ts(cities$`UPT - Chicago, IL--IN`, start=c(2002,1), frequency = 12)
sf_ts <- ts(cities$`UPT - San Francisco--Oakland, CA`, start=c(2002,1), frequency = 12)
dc_ts <- ts(cities$`UPT - Washington--Arlington, DC--VA--MD`, start=c(2002,1), frequency = 12)
Code
combined_ts <- cbind(nyc_ts, la_ts, chi_ts, sf_ts, dc_ts)
combined_scaled <- scale(combined_ts)
combined_scaled_ts <- ts(combined_scaled, 
                         start = start(combined_ts), 
                         frequency = 12)
VARselect(combined_scaled_ts, lag.max=10, type="both")
$selection
AIC(n)  HQ(n)  SC(n) FPE(n) 
    10      6      1     10 

$criteria
                   1             2             3             4             5
AIC(n) -1.620730e+01 -1.642332e+01 -1.669479e+01 -1.681071e+01 -1.699751e+01
HQ(n)  -1.601894e+01 -1.610041e+01 -1.623734e+01 -1.621872e+01 -1.627097e+01
SC(n)  -1.573833e+01 -1.561937e+01 -1.555586e+01 -1.533680e+01 -1.518862e+01
FPE(n)  9.147123e-08  7.371799e-08  5.622299e-08  5.011922e-08  4.164519e-08
                   6             7             8             9            10
AIC(n) -1.715015e+01 -1.724309e+01 -1.733849e+01 -1.740427e+01 -1.744107e+01
HQ(n)  -1.628906e+01 -1.624746e+01 -1.620832e+01 -1.613956e+01 -1.604181e+01
SC(n)  -1.500627e+01 -1.476424e+01 -1.452466e+01 -1.425546e+01 -1.395728e+01
FPE(n)  3.583210e-08  3.275481e-08  2.989807e-08  2.814345e-08  2.730607e-08
Code
summary(vars::VAR(combined_scaled_ts, p=1, type='both'))

VAR Estimation Results:
========================= 
Endogenous variables: nyc_ts, la_ts, chi_ts, sf_ts, dc_ts 
Deterministic variables: both 
Sample size: 277 
Log Likelihood: 324.513 
Roots of the characteristic polynomial:
0.9013 0.9013 0.6365 0.6365 0.4456
Call:
vars::VAR(y = combined_scaled_ts, p = 1, type = "both")


Estimation results for equation nyc_ts: 
======================================= 
nyc_ts = nyc_ts.l1 + la_ts.l1 + chi_ts.l1 + sf_ts.l1 + dc_ts.l1 + const + trend 

            Estimate Std. Error t value Pr(>|t|)    
nyc_ts.l1  0.5651862  0.0986927   5.727 2.72e-08 ***
la_ts.l1   0.0791725  0.0935846   0.846   0.3983    
chi_ts.l1  0.3152663  0.1453402   2.169   0.0309 *  
sf_ts.l1   0.0827019  0.1161699   0.712   0.4771    
dc_ts.l1  -0.0441416  0.1095089  -0.403   0.6872    
const     -0.2070877  0.0892177  -2.321   0.0210 *  
trend      0.0014544  0.0006108   2.381   0.0180 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.4083 on 270 degrees of freedom
Multiple R-Squared: 0.8375, Adjusted R-squared: 0.8339 
F-statistic: 231.9 on 6 and 270 DF,  p-value: < 2.2e-16 


Estimation results for equation la_ts: 
====================================== 
la_ts = nyc_ts.l1 + la_ts.l1 + chi_ts.l1 + sf_ts.l1 + dc_ts.l1 + const + trend 

            Estimate Std. Error t value Pr(>|t|)    
nyc_ts.l1 -0.0382173  0.0759019  -0.504   0.6150    
la_ts.l1   0.7726496  0.0719733  10.735   <2e-16 ***
chi_ts.l1  0.1639352  0.1117771   1.467   0.1436    
sf_ts.l1  -0.1124906  0.0893431  -1.259   0.2091    
dc_ts.l1   0.1087117  0.0842202   1.291   0.1979    
const      0.1213895  0.0686149   1.769   0.0780 .  
trend     -0.0009075  0.0004697  -1.932   0.0544 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.314 on 270 degrees of freedom
Multiple R-Squared: 0.9038, Adjusted R-squared: 0.9017 
F-statistic: 422.9 on 6 and 270 DF,  p-value: < 2.2e-16 


Estimation results for equation chi_ts: 
======================================= 
chi_ts = nyc_ts.l1 + la_ts.l1 + chi_ts.l1 + sf_ts.l1 + dc_ts.l1 + const + trend 

            Estimate Std. Error t value Pr(>|t|)    
nyc_ts.l1 -0.1408422  0.0639707  -2.202   0.0285 *  
la_ts.l1   0.1092873  0.0606597   1.802   0.0727 .  
chi_ts.l1  0.8345731  0.0942066   8.859   <2e-16 ***
sf_ts.l1   0.1226508  0.0752990   1.629   0.1045    
dc_ts.l1   0.0466324  0.0709814   0.657   0.5118    
const     -0.0226046  0.0578291  -0.391   0.6962    
trend      0.0001153  0.0003959   0.291   0.7710    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.2647 on 270 degrees of freedom
Multiple R-Squared: 0.9317, Adjusted R-squared: 0.9302 
F-statistic: 613.8 on 6 and 270 DF,  p-value: < 2.2e-16 


Estimation results for equation sf_ts: 
====================================== 
sf_ts = nyc_ts.l1 + la_ts.l1 + chi_ts.l1 + sf_ts.l1 + dc_ts.l1 + const + trend 

            Estimate Std. Error t value Pr(>|t|)    
nyc_ts.l1  0.0952346  0.0721728   1.320   0.1881    
la_ts.l1  -0.0974373  0.0684373  -1.424   0.1557    
chi_ts.l1  0.2358824  0.1062856   2.219   0.0273 *  
sf_ts.l1   0.6509138  0.0849537   7.662  3.3e-13 ***
dc_ts.l1   0.0457598  0.0800825   0.571   0.5682    
const      0.0917894  0.0652439   1.407   0.1606    
trend     -0.0006936  0.0004467  -1.553   0.1216    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.2986 on 270 degrees of freedom
Multiple R-Squared: 0.913,  Adjusted R-squared: 0.9111 
F-statistic: 472.5 on 6 and 270 DF,  p-value: < 2.2e-16 


Estimation results for equation dc_ts: 
====================================== 
dc_ts = nyc_ts.l1 + la_ts.l1 + chi_ts.l1 + sf_ts.l1 + dc_ts.l1 + const + trend 

            Estimate Std. Error t value Pr(>|t|)    
nyc_ts.l1  0.0900033  0.0812430   1.108   0.2689    
la_ts.l1   0.1732059  0.0770380   2.248   0.0254 *  
chi_ts.l1  0.0269565  0.1196427   0.225   0.8219    
sf_ts.l1  -0.0210804  0.0956300  -0.220   0.8257    
dc_ts.l1   0.6609317  0.0901467   7.332 2.65e-12 ***
const      0.0836427  0.0734432   1.139   0.2558    
trend     -0.0006165  0.0005028  -1.226   0.2212    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.3361 on 270 degrees of freedom
Multiple R-Squared: 0.8899, Adjusted R-squared: 0.8874 
F-statistic: 363.6 on 6 and 270 DF,  p-value: < 2.2e-16 



Covariance matrix of residuals:
        nyc_ts   la_ts  chi_ts   sf_ts   dc_ts
nyc_ts 0.16673 0.09264 0.08848 0.09711 0.10952
la_ts  0.09264 0.09862 0.06577 0.07017 0.07702
chi_ts 0.08848 0.06577 0.07005 0.06651 0.07708
sf_ts  0.09711 0.07017 0.06651 0.08916 0.07838
dc_ts  0.10952 0.07702 0.07708 0.07838 0.11298

Correlation matrix of residuals:
       nyc_ts  la_ts chi_ts  sf_ts  dc_ts
nyc_ts 1.0000 0.7225 0.8188 0.7964 0.7979
la_ts  0.7225 1.0000 0.7913 0.7483 0.7296
chi_ts 0.8188 0.7913 1.0000 0.8416 0.8665
sf_ts  0.7964 0.7483 0.8416 1.0000 0.7809
dc_ts  0.7979 0.7296 0.8665 0.7809 1.0000
Code
summary(vars::VAR(combined_scaled_ts, p=6, type='both'))

VAR Estimation Results:
========================= 
Endogenous variables: nyc_ts, la_ts, chi_ts, sf_ts, dc_ts 
Deterministic variables: both 
Sample size: 272 
Log Likelihood: 568.65 
Roots of the characteristic polynomial:
0.9759 0.9759 0.9028 0.9028 0.8989 0.8989 0.8929 0.8929 0.8869 0.8467 0.8467 0.7925 0.7925 0.7784 0.7784 0.7168 0.7168 0.6951 0.6951 0.6892 0.6739 0.6428 0.6428 0.6067 0.6067 0.586 0.586 0.544 0.544 0.4051
Call:
vars::VAR(y = combined_scaled_ts, p = 6, type = "both")


Estimation results for equation nyc_ts: 
======================================= 
nyc_ts = nyc_ts.l1 + la_ts.l1 + chi_ts.l1 + sf_ts.l1 + dc_ts.l1 + nyc_ts.l2 + la_ts.l2 + chi_ts.l2 + sf_ts.l2 + dc_ts.l2 + nyc_ts.l3 + la_ts.l3 + chi_ts.l3 + sf_ts.l3 + dc_ts.l3 + nyc_ts.l4 + la_ts.l4 + chi_ts.l4 + sf_ts.l4 + dc_ts.l4 + nyc_ts.l5 + la_ts.l5 + chi_ts.l5 + sf_ts.l5 + dc_ts.l5 + nyc_ts.l6 + la_ts.l6 + chi_ts.l6 + sf_ts.l6 + dc_ts.l6 + const + trend 

            Estimate Std. Error t value Pr(>|t|)    
nyc_ts.l1  5.941e-01  1.248e-01   4.762 3.32e-06 ***
la_ts.l1   2.286e-05  1.426e-01   0.000  0.99987    
chi_ts.l1  6.565e-01  2.475e-01   2.653  0.00851 ** 
sf_ts.l1  -1.087e-01  1.699e-01  -0.640  0.52299    
dc_ts.l1  -1.109e-01  1.716e-01  -0.647  0.51851    
nyc_ts.l2 -1.913e-01  1.395e-01  -1.371  0.17160    
la_ts.l2   7.227e-03  1.886e-01   0.038  0.96946    
chi_ts.l2 -1.626e-03  2.863e-01  -0.006  0.99547    
sf_ts.l2   4.023e-01  1.931e-01   2.083  0.03828 *  
dc_ts.l2   2.071e-01  1.928e-01   1.074  0.28395    
nyc_ts.l3  2.101e-01  1.446e-01   1.452  0.14768    
la_ts.l3   5.005e-02  1.966e-01   0.255  0.79926    
chi_ts.l3 -7.772e-01  2.804e-01  -2.771  0.00602 ** 
sf_ts.l3  -1.871e-01  1.961e-01  -0.954  0.34085    
dc_ts.l3   1.348e-01  1.865e-01   0.723  0.47052    
nyc_ts.l4  1.422e-01  1.423e-01   0.999  0.31859    
la_ts.l4  -1.962e-02  1.945e-01  -0.101  0.91972    
chi_ts.l4  2.630e-01  2.818e-01   0.933  0.35165    
sf_ts.l4  -5.850e-02  1.963e-01  -0.298  0.76590    
dc_ts.l4  -5.638e-01  1.850e-01  -3.048  0.00256 ** 
nyc_ts.l5  1.359e-01  1.455e-01   0.934  0.35130    
la_ts.l5   6.166e-02  1.827e-01   0.337  0.73607    
chi_ts.l5  1.801e-01  2.837e-01   0.635  0.52616    
sf_ts.l5   3.637e-01  1.900e-01   1.914  0.05675 .  
dc_ts.l5  -2.075e-01  1.804e-01  -1.150  0.25129    
nyc_ts.l6 -2.144e-01  1.314e-01  -1.632  0.10402    
la_ts.l6   4.525e-02  1.359e-01   0.333  0.73951    
chi_ts.l6 -2.139e-01  2.511e-01  -0.852  0.39501    
sf_ts.l6  -2.888e-01  1.643e-01  -1.758  0.08003 .  
dc_ts.l6   4.821e-01  1.668e-01   2.890  0.00421 ** 
const     -1.475e-01  1.297e-01  -1.137  0.25650    
trend      1.077e-03  8.891e-04   1.211  0.22690    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.3685 on 240 degrees of freedom
Multiple R-Squared: 0.8822, Adjusted R-squared: 0.867 
F-statistic: 57.98 on 31 and 240 DF,  p-value: < 2.2e-16 


Estimation results for equation la_ts: 
====================================== 
la_ts = nyc_ts.l1 + la_ts.l1 + chi_ts.l1 + sf_ts.l1 + dc_ts.l1 + nyc_ts.l2 + la_ts.l2 + chi_ts.l2 + sf_ts.l2 + dc_ts.l2 + nyc_ts.l3 + la_ts.l3 + chi_ts.l3 + sf_ts.l3 + dc_ts.l3 + nyc_ts.l4 + la_ts.l4 + chi_ts.l4 + sf_ts.l4 + dc_ts.l4 + nyc_ts.l5 + la_ts.l5 + chi_ts.l5 + sf_ts.l5 + dc_ts.l5 + nyc_ts.l6 + la_ts.l6 + chi_ts.l6 + sf_ts.l6 + dc_ts.l6 + const + trend 

            Estimate Std. Error t value Pr(>|t|)    
nyc_ts.l1  0.0264998  0.0906261   0.292  0.77023    
la_ts.l1   0.8658130  0.1036037   8.357 5.22e-15 ***
chi_ts.l1 -0.0704368  0.1797596  -0.392  0.69553    
sf_ts.l1  -0.2953685  0.1234526  -2.393  0.01750 *  
dc_ts.l1   0.1579590  0.1246336   1.267  0.20625    
nyc_ts.l2 -0.0905283  0.1013513  -0.893  0.37264    
la_ts.l2  -0.4149466  0.1369860  -3.029  0.00272 ** 
chi_ts.l2  0.4147047  0.2079697   1.994  0.04728 *  
sf_ts.l2   0.2565477  0.1402691   1.829  0.06865 .  
dc_ts.l2   0.1840588  0.1400720   1.314  0.19009    
nyc_ts.l3  0.0517886  0.1050579   0.493  0.62250    
la_ts.l3   0.1760458  0.1428209   1.233  0.21892    
chi_ts.l3 -0.4087539  0.2037134  -2.007  0.04592 *  
sf_ts.l3  -0.1684374  0.1424187  -1.183  0.23810    
dc_ts.l3   0.1041496  0.1354853   0.769  0.44282    
nyc_ts.l4  0.2352030  0.1033833   2.275  0.02379 *  
la_ts.l4   0.1123143  0.1412954   0.795  0.42746    
chi_ts.l4  0.0243446  0.2046968   0.119  0.90543    
sf_ts.l4  -0.2484727  0.1425788  -1.743  0.08267 .  
dc_ts.l4  -0.3921866  0.1343800  -2.918  0.00385 ** 
nyc_ts.l5  0.1093651  0.1057259   1.034  0.30198    
la_ts.l5  -0.0231298  0.1327459  -0.174  0.86182    
chi_ts.l5  0.4549206  0.2061099   2.207  0.02825 *  
sf_ts.l5   0.2583937  0.1379952   1.872  0.06236 .  
dc_ts.l5  -0.2301886  0.1310732  -1.756  0.08033 .  
nyc_ts.l6 -0.2400099  0.0954216  -2.515  0.01255 *  
la_ts.l6   0.1049827  0.0987522   1.063  0.28881    
chi_ts.l6 -0.1961645  0.1823941  -1.075  0.28323    
sf_ts.l6  -0.0756840  0.1193407  -0.634  0.52656    
dc_ts.l6   0.2393999  0.1211948   1.975  0.04938 *  
const      0.1248689  0.0942114   1.325  0.18629    
trend     -0.0009495  0.0006459  -1.470  0.14287    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.2677 on 240 degrees of freedom
Multiple R-Squared: 0.9376, Adjusted R-squared: 0.9296 
F-statistic: 116.4 on 31 and 240 DF,  p-value: < 2.2e-16 


Estimation results for equation chi_ts: 
======================================= 
chi_ts = nyc_ts.l1 + la_ts.l1 + chi_ts.l1 + sf_ts.l1 + dc_ts.l1 + nyc_ts.l2 + la_ts.l2 + chi_ts.l2 + sf_ts.l2 + dc_ts.l2 + nyc_ts.l3 + la_ts.l3 + chi_ts.l3 + sf_ts.l3 + dc_ts.l3 + nyc_ts.l4 + la_ts.l4 + chi_ts.l4 + sf_ts.l4 + dc_ts.l4 + nyc_ts.l5 + la_ts.l5 + chi_ts.l5 + sf_ts.l5 + dc_ts.l5 + nyc_ts.l6 + la_ts.l6 + chi_ts.l6 + sf_ts.l6 + dc_ts.l6 + const + trend 

            Estimate Std. Error t value Pr(>|t|)    
nyc_ts.l1  0.0183188  0.0765514   0.239 0.811076    
la_ts.l1  -0.0051588  0.0875135  -0.059 0.953042    
chi_ts.l1  0.7241858  0.1518420   4.769 3.21e-06 ***
sf_ts.l1   0.1056018  0.1042797   1.013 0.312234    
dc_ts.l1   0.0381135  0.1052773   0.362 0.717648    
nyc_ts.l2 -0.2777353  0.0856108  -3.244 0.001346 ** 
la_ts.l2   0.0171294  0.1157113   0.148 0.882439    
chi_ts.l2  0.1024622  0.1756708   0.583 0.560264    
sf_ts.l2   0.1654287  0.1184845   1.396 0.163943    
dc_ts.l2   0.2115191  0.1183180   1.788 0.075083 .  
nyc_ts.l3  0.0606307  0.0887418   0.683 0.495123    
la_ts.l3   0.0070233  0.1206400   0.058 0.953624    
chi_ts.l3 -0.3289464  0.1720756  -1.912 0.057113 .  
sf_ts.l3  -0.2884947  0.1203003  -2.398 0.017244 *  
dc_ts.l3   0.3306100  0.1144437   2.889 0.004220 ** 
nyc_ts.l4  0.1272030  0.0873273   1.457 0.146527    
la_ts.l4   0.0447830  0.1193514   0.375 0.707828    
chi_ts.l4  0.3276220  0.1729063   1.895 0.059321 .  
sf_ts.l4  -0.2277076  0.1204355  -1.891 0.059869 .  
dc_ts.l4  -0.4133143  0.1135100  -3.641 0.000332 ***
nyc_ts.l5  0.1263669  0.0893060   1.415 0.158368    
la_ts.l5  -0.0100330  0.1121297  -0.089 0.928778    
chi_ts.l5  0.2422265  0.1740998   1.391 0.165420    
sf_ts.l5   0.3633172  0.1165638   3.117 0.002050 ** 
dc_ts.l5  -0.2963479  0.1107168  -2.677 0.007949 ** 
nyc_ts.l6 -0.1761885  0.0806021  -2.186 0.029789 *  
la_ts.l6   0.0558709  0.0834154   0.670 0.503635    
chi_ts.l6 -0.0925180  0.1540673  -0.601 0.548737    
sf_ts.l6  -0.0712971  0.1008064  -0.707 0.480087    
dc_ts.l6   0.1134429  0.1023726   1.108 0.268912    
const     -0.0526736  0.0795798  -0.662 0.508672    
trend      0.0003010  0.0005456   0.552 0.581635    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.2261 on 240 degrees of freedom
Multiple R-Squared: 0.9556, Adjusted R-squared: 0.9498 
F-statistic: 166.5 on 31 and 240 DF,  p-value: < 2.2e-16 


Estimation results for equation sf_ts: 
====================================== 
sf_ts = nyc_ts.l1 + la_ts.l1 + chi_ts.l1 + sf_ts.l1 + dc_ts.l1 + nyc_ts.l2 + la_ts.l2 + chi_ts.l2 + sf_ts.l2 + dc_ts.l2 + nyc_ts.l3 + la_ts.l3 + chi_ts.l3 + sf_ts.l3 + dc_ts.l3 + nyc_ts.l4 + la_ts.l4 + chi_ts.l4 + sf_ts.l4 + dc_ts.l4 + nyc_ts.l5 + la_ts.l5 + chi_ts.l5 + sf_ts.l5 + dc_ts.l5 + nyc_ts.l6 + la_ts.l6 + chi_ts.l6 + sf_ts.l6 + dc_ts.l6 + const + trend 

            Estimate Std. Error t value Pr(>|t|)    
nyc_ts.l1  1.932e-01  8.362e-02   2.310 0.021727 *  
la_ts.l1  -5.822e-02  9.559e-02  -0.609 0.543102    
chi_ts.l1  2.928e-01  1.659e-01   1.765 0.078769 .  
sf_ts.l1   3.368e-01  1.139e-01   2.957 0.003415 ** 
dc_ts.l1   1.126e-01  1.150e-01   0.979 0.328419    
nyc_ts.l2 -2.737e-01  9.351e-02  -2.926 0.003759 ** 
la_ts.l2  -4.719e-02  1.264e-01  -0.373 0.709187    
chi_ts.l2  6.048e-02  1.919e-01   0.315 0.752904    
sf_ts.l2   4.322e-01  1.294e-01   3.339 0.000973 ***
dc_ts.l2   9.115e-02  1.292e-01   0.705 0.481338    
nyc_ts.l3  1.922e-01  9.693e-02   1.983 0.048518 *  
la_ts.l3   6.129e-02  1.318e-01   0.465 0.642273    
chi_ts.l3 -5.280e-01  1.880e-01  -2.809 0.005373 ** 
sf_ts.l3  -1.904e-01  1.314e-01  -1.449 0.148649    
dc_ts.l3   1.997e-01  1.250e-01   1.597 0.111534    
nyc_ts.l4  8.206e-02  9.539e-02   0.860 0.390497    
la_ts.l4   7.068e-02  1.304e-01   0.542 0.588224    
chi_ts.l4  8.817e-02  1.889e-01   0.467 0.641054    
sf_ts.l4  -4.753e-02  1.316e-01  -0.361 0.718178    
dc_ts.l4  -3.483e-01  1.240e-01  -2.809 0.005377 ** 
nyc_ts.l5  6.538e-02  9.755e-02   0.670 0.503363    
la_ts.l5   4.589e-02  1.225e-01   0.375 0.708215    
chi_ts.l5  2.896e-01  1.902e-01   1.523 0.129061    
sf_ts.l5   4.334e-01  1.273e-01   3.404 0.000778 ***
dc_ts.l5  -2.553e-01  1.209e-01  -2.111 0.035773 *  
nyc_ts.l6 -2.456e-01  8.804e-02  -2.790 0.005693 ** 
la_ts.l6  -4.646e-02  9.112e-02  -0.510 0.610577    
chi_ts.l6 -3.826e-02  1.683e-01  -0.227 0.820333    
sf_ts.l6  -1.362e-01  1.101e-01  -1.237 0.217211    
dc_ts.l6   1.493e-01  1.118e-01   1.335 0.183161    
const     -6.140e-03  8.693e-02  -0.071 0.943749    
trend     -2.219e-05  5.959e-04  -0.037 0.970332    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.247 on 240 degrees of freedom
Multiple R-Squared: 0.9469, Adjusted R-squared: 0.9401 
F-statistic: 138.1 on 31 and 240 DF,  p-value: < 2.2e-16 


Estimation results for equation dc_ts: 
====================================== 
dc_ts = nyc_ts.l1 + la_ts.l1 + chi_ts.l1 + sf_ts.l1 + dc_ts.l1 + nyc_ts.l2 + la_ts.l2 + chi_ts.l2 + sf_ts.l2 + dc_ts.l2 + nyc_ts.l3 + la_ts.l3 + chi_ts.l3 + sf_ts.l3 + dc_ts.l3 + nyc_ts.l4 + la_ts.l4 + chi_ts.l4 + sf_ts.l4 + dc_ts.l4 + nyc_ts.l5 + la_ts.l5 + chi_ts.l5 + sf_ts.l5 + dc_ts.l5 + nyc_ts.l6 + la_ts.l6 + chi_ts.l6 + sf_ts.l6 + dc_ts.l6 + const + trend 

            Estimate Std. Error t value Pr(>|t|)    
nyc_ts.l1  0.2428969  0.0957543   2.537 0.011826 *  
la_ts.l1   0.0130334  0.1094663   0.119 0.905325    
chi_ts.l1  0.1633751  0.1899317   0.860 0.390549    
sf_ts.l1  -0.1617195  0.1304384  -1.240 0.216255    
dc_ts.l1   0.5352509  0.1316862   4.065 6.52e-05 ***
nyc_ts.l2 -0.2471881  0.1070864  -2.308 0.021833 *  
la_ts.l2  -0.0053471  0.1447375  -0.037 0.970561    
chi_ts.l2  0.0166029  0.2197381   0.076 0.939834    
sf_ts.l2   0.1513806  0.1482064   1.021 0.308085    
dc_ts.l2   0.3169702  0.1479982   2.142 0.033223 *  
nyc_ts.l3  0.1114333  0.1110027   1.004 0.316448    
la_ts.l3   0.0248196  0.1509026   0.164 0.869496    
chi_ts.l3 -0.7450566  0.2152409  -3.462 0.000636 ***
sf_ts.l3  -0.0960272  0.1504777  -0.638 0.523985    
dc_ts.l3   0.5392735  0.1431520   3.767 0.000208 ***
nyc_ts.l4  0.2784489  0.1092334   2.549 0.011423 *  
la_ts.l4   0.0981004  0.1492908   0.657 0.511740    
chi_ts.l4  0.0511393  0.2162800   0.236 0.813286    
sf_ts.l4  -0.2624724  0.1506469  -1.742 0.082737 .  
dc_ts.l4  -0.5254030  0.1419841  -3.700 0.000267 ***
nyc_ts.l5  0.0160683  0.1117085   0.144 0.885747    
la_ts.l5  -0.0790004  0.1402575  -0.563 0.573789    
chi_ts.l5  0.5303232  0.2177729   2.435 0.015612 *  
sf_ts.l5   0.3512093  0.1458039   2.409 0.016761 *  
dc_ts.l5  -0.3328183  0.1384902  -2.403 0.017013 *  
nyc_ts.l6 -0.2447654  0.1008212  -2.428 0.015931 *  
la_ts.l6   0.0815792  0.1043403   0.782 0.435069    
chi_ts.l6  0.0503956  0.1927152   0.262 0.793929    
sf_ts.l6  -0.1187678  0.1260938  -0.942 0.347191    
dc_ts.l6   0.1926663  0.1280528   1.505 0.133746    
const      0.0955112  0.0995425   0.960 0.338272    
trend     -0.0007291  0.0006824  -1.068 0.286448    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.2829 on 240 degrees of freedom
Multiple R-Squared: 0.9306, Adjusted R-squared: 0.9216 
F-statistic: 103.8 on 31 and 240 DF,  p-value: < 2.2e-16 



Covariance matrix of residuals:
        nyc_ts   la_ts  chi_ts   sf_ts   dc_ts
nyc_ts 0.13583 0.06925 0.06887 0.07108 0.08545
la_ts  0.06925 0.07168 0.04658 0.04698 0.05312
chi_ts 0.06887 0.04658 0.05114 0.04589 0.05433
sf_ts  0.07108 0.04698 0.04589 0.06102 0.05456
dc_ts  0.08545 0.05312 0.05433 0.05456 0.08002

Correlation matrix of residuals:
       nyc_ts  la_ts chi_ts  sf_ts  dc_ts
nyc_ts 1.0000 0.7019 0.8263 0.7807 0.8196
la_ts  0.7019 1.0000 0.7693 0.7104 0.7014
chi_ts 0.8263 0.7693 1.0000 0.8214 0.8493
sf_ts  0.7807 0.7104 0.8214 1.0000 0.7808
dc_ts  0.8196 0.7014 0.8493 0.7808 1.0000

Cross Validation

Code
n <- length(combined_scaled_ts[,1])
k <- floor(n / 4 / 12) * 12
h <- 12 
n_k <- n - k
q_steps <- n_k / h

rmse1 <- matrix(NA, n_k, 5)
rmse2 <- matrix(NA, n_k, 5)

st <- tsp(combined_scaled_ts)[1] + (k - 1) / 12  

# Cross-validation loop for time series forecasting
for (i in 1:q_steps) {  
  # Define training and testing windows
  xtrain <- window(combined_scaled_ts, end = st + i - 1)
  xtest <- window(combined_scaled_ts, start = st + (i - 1) + 1/12, end = st + i)
  
  ######## First Model (VAR with p=1) ########
  fit1 <- vars::VAR(xtrain, p = 1, type = "both")
  fcast1 <- predict(fit1, n.ahead = 12)
  
  f_nyc <- fcast1$fcst$nyc_ts[, 1]
  f_la <- fcast1$fcst$la_ts[, 1]
  f_chi <- fcast1$fcst$chi_ts[, 1]
  f_sf <- fcast1$fcst$sf_ts[, 1]
  f_dc <- fcast1$fcst$dc_ts[, 1]
  
  ff1 <- data.frame(f_nyc, f_la, f_chi, f_sf, f_dc)
  
  year <- st + (i - 1) + 1/12

  ff1 <- ts(ff1, start = c(year, 1), frequency = 12)

  a <- min(12 * i - 11)
  b <- min(12 * i)
  
  rmse1[c(a:b), ] <- sqrt((ff1 - xtest) ^ 2)
  
  fit2 <- vars::VAR(xtrain, p = 6, type = "both")
  fcast2 <- predict(fit2, n.ahead = 12)
  
  f_nyc2 <- fcast2$fcst$nyc_ts[, 1]
  f_la2 <- fcast2$fcst$la_ts[, 1]
  f_chi2 <- fcast2$fcst$chi_ts[, 1]
  f_sf2 <- fcast2$fcst$sf_ts[, 1]
  f_dc2 <- fcast2$fcst$dc_ts[, 1]
  
  ff2 <- data.frame(f_nyc2, f_la2, f_chi2, f_sf2, f_dc2)
  
  year <- st + (i - 1) + 1/12

  ff2 <- ts(ff2, start = c(year, 1), frequency = 12)
  
  a <- min(12 * i - 11)
  b <- min(12 * i)

  rmse2[c(a:b), ] <- sqrt((ff2 - xtest) ^ 2)
}

print(paste("Model 1 (p=1): ", mean(rmse1, na.rm = TRUE)))
[1] "Model 1 (p=1):  0.425439382021131"
Code
print(paste("Model 2 (p=6): ", mean(rmse2, na.rm = TRUE)))
[1] "Model 2 (p=6):  0.429315348036631"

Chosen Model

My chosen model is VAR(1) based on these RMSE values.

Forecasting

Code
fit <- VAR(combined_ts, p = 1, type = "both")

fit_pr <- predict(fit, n.ahead = 5, ci = 0.95)

actual_data <- as.data.frame(combined_ts)
actual_data$Time <- as.numeric(time(combined_ts))

f_nyc <- fit_pr$fcst$nyc_ts[, "fcst"]
f_la <- fit_pr$fcst$la_ts[, "fcst"]
f_chi <- fit_pr$fcst$chi_ts[, "fcst"]
f_sf <- fit_pr$fcst$sf_ts[, "fcst"]
f_dc <- fit_pr$fcst$dc_ts[, "fcst"]

nyc_lower <- fit_pr$fcst$nyc_ts[, "lower"]
nyc_upper <- fit_pr$fcst$nyc_ts[, "upper"]

la_lower <- fit_pr$fcst$la_ts[, "lower"]
la_upper <- fit_pr$fcst$la_ts[, "upper"]

chi_lower <- fit_pr$fcst$chi_ts[, "lower"]
chi_upper <- fit_pr$fcst$chi_ts[, "upper"]

sf_lower <- fit_pr$fcst$sf_ts[, "lower"]
sf_upper <- fit_pr$fcst$sf_ts[, "upper"]

dc_lower <- fit_pr$fcst$dc_ts[, "lower"]
dc_upper <- fit_pr$fcst$dc_ts[, "upper"]

# Create future time points as numeric
time_index <- as.numeric(seq(from = end(combined_ts)[1] + 1, length.out = 5))

# Create data frames for forecasts
forecast_nyc <- data.frame(Time = time_index, Forecast = f_nyc, Lower = nyc_lower, Upper = nyc_upper)
forecast_la <- data.frame(Time = time_index, Forecast = f_la, Lower = la_lower, Upper = la_upper)
forecast_chi <- data.frame(Time = time_index, Forecast = f_chi, Lower = chi_lower, Upper = chi_upper)
forecast_sf <- data.frame(Time = time_index, Forecast = f_sf, Lower = sf_lower, Upper = sf_upper)
forecast_dc <- data.frame(Time = time_index, Forecast = f_dc, Lower = dc_lower, Upper = dc_upper)

p1 <- ggplot() +
  geom_line(data = actual_data, aes(x = Time, y = nyc_ts), color = "blue") +
  geom_ribbon(data = forecast_nyc, aes(x = Time, ymin = Lower, ymax = Upper), 
              fill = "blue", alpha = 0.2) +
  geom_line(data = forecast_nyc, aes(x = Time, y = Forecast), color = "blue", linetype = "dashed") +
  labs(title = "New York, NY Forecast", x = "Year", y = "New York, NY Ridership") +
  theme_minimal()

p2 <- ggplot() +
  geom_line(data = actual_data, aes(x = Time, y = la_ts), color = "red") +
  geom_ribbon(data = forecast_la, aes(x = Time, ymin = Lower, ymax = Upper), 
              fill = "red", alpha = 0.2) +
  geom_line(data = forecast_la, aes(x = Time, y = Forecast), color = "red", linetype = "dashed") +
  labs(title = "Los Angeles, CA Forecast", x = "Year", y = "Los Angeles, CA Ridership") +
  theme_minimal()

p3 <- ggplot() +
  geom_line(data = actual_data, aes(x = Time, y = chi_ts), color = "green") +
  geom_ribbon(data = forecast_chi, aes(x = Time, ymin = Lower, ymax = Upper), 
              fill = "green", alpha = 0.2) +
  geom_line(data = forecast_chi, aes(x = Time, y = Forecast), color = "green", linetype = "dashed") +
  labs(title = "Chicago, IL Forecast", x = "Year", y = "Chicago, IL Ridership") +
  theme_minimal()

p4 <- ggplot() +
  geom_line(data = actual_data, aes(x = Time, y = sf_ts), color = "purple") +
  geom_ribbon(data = forecast_sf, aes(x = Time, ymin = Lower, ymax = Upper), 
              fill = "purple", alpha = 0.2) +
  geom_line(data = forecast_sf, aes(x = Time, y = Forecast), color = "purple", linetype = "dashed") +
  labs(title = "San Francisco, CA Forecast", x = "Year", y = "San Francisco, CA Ridership") +
  theme_minimal()

p5 <- ggplot() +
  geom_line(data = actual_data, aes(x = Time, y = dc_ts), color = "orange") +
  geom_ribbon(data = forecast_dc, aes(x = Time, ymin = Lower, ymax = Upper), 
              fill = "orange", alpha = 0.2) +
  geom_line(data = forecast_dc, aes(x = Time, y = Forecast), color = "orange", linetype = "dashed") +
  labs(title = "Washington, DC Forecast", x = "Year", y = "Washington, DC Ridership") +
  theme_minimal()

# Arrange plots using patchwork
p1 / p2

Code
p3 / p4

Code
p5

Much like the other VAR models, this is not a particularly convincing forecast. In all cases except for Chicago, we are seeing a gradually decreasing slope with any seasonality lost. Confidence bands tell us that these predictions are imprecise. It is interesting to see the disparities in the predictions, indicating that there are differences to be identified between these cities. This will be an interesting phenomenon to further explore in the interrupted time series section.